File Descriptors

As discussed on page 270, before a process can read from or write to a file it must open that file. When a process opens a file, Linux associates a number (called a file descriptor) with the file. Each process has its own set of open files and its own file descriptors. After opening a file, a process reads from and writes to that file by referring to its file descriptor. When it no longer needs the file, the process closes the file, freeing the file descriptor.

A typical Linux process starts with three open files: standard input (file descriptor 0), standard output (file descriptor 1), and standard error (file descriptor 2). Often those are the only files the process needs. Recall that you redirect standard output with the symbol > or the symbol 1> and that you redirect standard error with the symbol 2>. Although you can redirect other file descriptors, because file descriptors other than 0, 1, and 2 do not have any special conventional meaning, it is rarely useful to do so. The exception is in programs that you write yourself, in which case you control the meaning of the file descriptors and can take advantage of redirection.

Opening a file descriptor

The Bourne Again Shell opens files using the exec builtin as follows:

exec n> outfile
exec m< infile

The first line opens outfile for output and holds it open, associating it with file descriptor n. The second line opens infile for input and holds it open, associating it with file descriptor m.

Duplicating a file descriptor

The <& token duplicates an input file descriptor; use >& to duplicate an output file descriptor. You can duplicate a file descriptor by making it refer to the same file as another open file descriptor, such as standard input or output. Use the following format to open or redirect file descriptor n as a duplicate of file descriptor m:

exec n<&m

Once you have opened a file, you can use it for input and output in two different ways. First, you can use I/O redirection on any command line, redirecting standard output to a file descriptor with >&n or redirecting standard input from a file descriptor with <&n. Second, you can use the read (page 927) and echo builtins. If you invoke other commands, including functions (page 321), they inherit these open files and file descriptors. When you have finished using a file, you can close it with

exec n<&

When you invoke the shell function in the next example, named mycp, with two arguments, it copies the file named by the first argument to the file named by the second argument. If you supply only one argument, the script copies the file named by the argument to standard output. If you invoke mycp with no arguments, it copies standard input to standard output.

Tip: A function is not a shell script

The mycp example is a shell function; it will not work as you expect if you execute it as a shell script. (It will work: The function will be created in a very short-lived subshell, which is probably of little use.) You can enter this function from the keyboard. If you put the function in a file, you can run it as an argument to the . (dot) builtin (page 269). You can also put the function in a startup file if you want it to be always available (page 323).

function mycp ()
{
case $# in
    0)
        # zero arguments
        # file descriptor 3 duplicates standard input
        # file descriptor 4 duplicates standard output
        exec 3<&0 4<&1
        ;;
    1)
        # one argument
        # open the file named by the argument for input
        # and associate it with file descriptor 3
        # file descriptor 4 duplicates standard output
        exec 3< $1 4<&1
        ;;
    2)
        # two arguments
        # open the file named by the first argument for input
        # and associate it with file descriptor 3
        # open the file named by the second argument for output
        # and associate it with file descriptor 4
        exec 3< $1 4> $2
        ;;
   *)
        echo "Usage: mycp [source [dest]]"
        return 1
        ;;
esac
# call cat with input coming from file descriptor 3
# and output going to file descriptor 4
cat <&3 >&4
# close file descriptors 3 and 4
exec 3<&- 4<&-
}

The real work of this function is done in the line that begins with cat. The rest of the script arranges for file descriptors 3 and 4, which are the input and output of the cat command, to be associated with the appropriate files.

Optional

The next program takes two filenames on the command line, sorts both, and sends the output to temporary files. The program then merges the sorted files to standard output, preceding each line by a number that indicates which file it came from.

$ cat sortmerg
#!/bin/bash
usage ()
{
if [ $# -ne 2 ]; then
    echo "Usage: $0 file1 file2" 2>&1
    exit 1
    fi
}
# Default temporary directory
: ${TEMPDIR:=/tmp}
# Check argument count
usage "$@"
# Set up temporary files for sorting
file1=$TEMPDIR/$$.file1
file2=$TEMPDIR/$$.file2
# Sort
sort $1 > $file1
sort $2 > $file2
# Open $file1 and $file2 for reading. Use file descriptors 3 and 4.
exec 3<$file1
exec 4<$file2
# Read the first line from each file to figure out how to start.
read Line1 <&3
status1=$?
read Line2 <&4
status2=$?
# Strategy: while there is still input left in both files:
#   Output the line that should come first.
#   Read a new line from the file that line came from.
while [ $status1 -eq 0 -a $status2 -eq 0 ]
       do
       if [[ "$Line2" > "$Line1" ]]; then
           echo -e "1.\t$Line1"
           read -u3 Line1
           status1=$?
       else
           echo -e "2.\t$Line2"
           read -u4 Line2
           status2=$?
       fi
    done
# Now one of the files is at end-of-file.
# Read from each file until the end.
# First file1:
while [ $status1 -eq 0 ]
    do
       echo -e "1.\t$Line1"
       read Line1 <&3
       status1=$?
    done
# Next file2:
while [[ $status2 -eq 0 ]]
    do
       echo -e "2.\t$Line2"
       read Line2 <&4
       status2=$?
    done
# Close and remove both input files
exec 3<&- 4<&-
rm -f $file1 $file2
exit 0

Control Structures

Parameters and Variables