Working with Files

Working with Files

This section describes utilities that copy, move, print, search through, display, sort, and compare files.

Tip: Filename completion

After you enter one or more letters of a filename (following a command) on a command line, press TAB and the Bourne Again Shell will complete as much of the filename as it can. When only one filename starts with the characters you entered, the shell completes the filename and places a SPACE after it. You can keep typing or you can press RETURN to execute the command at this point. When the characters you entered do not uniquely identify a filename, the shell completes what it can and waits for more input. When pressing TAB does not change the display, press TAB again to display a list of possible completions. For more information refer to "" on page .

cp: Copies a File

The cp (copy) utility () makes a copy of a file. This utility can copy any file, including text and executable program (binary) files. You can use cp to make a backup copy of a file or a copy to experiment with.

Figure 5-2. cp copies a file
$ ls
memo
$ cp memo memo.copy
$ ls
memo memo.copy

The cp command line uses the following syntax to specify source and destination files:


cp source-file destination-file

The source-file is the name of the file that cp will copy. The destination-file is the name that cp assigns to the resulting (new) copy of the file.

The cp command line in copies the file named memo to memo.copy. The period is part of the filenamejust another character. The initial ls command shows that memo is the only file in the directory. After the cp command, a second ls shows two files in the directory, memo and memo.copy.

Sometimes it is useful to incorporate the date in the name of a copy of a file. The following example includes the date January 30 (0130) in the copied file:

$ cp memo memo.0130

Although it has no significance to Linux, the date can help you find a version of a file that you created on a certain date. Including the date can also help you avoid overwriting existing files by providing a unique filename each day. For more information refer to "" on page .

Use scp (page ) or ftp (page ) when you need to copy a file from one system to another on a common network.

Caution: cp can destroy a file

If the destination-file exists before you give a cp command, cp overwrites it. Because cp overwrites (and destroys the contents of) an existing destination-file without warning, you must take care not to cause cp to overwrite a file that you still need. The cp i (interactive) option prompts you before it overwrites a file. See page for a tip on options.

The following example assumes that the file named orange.2 exists before you give the cp command. The user answers y to overwrite the file:

$ cp i orange orange.2
cp: overwrite 'orange.2'? y


mv: Changes the Name of a File

The mv (move) utility can rename a file without making a copy of it. The mv command line specifies an existing file and a new filename using the same syntax as cp:


mv existing-filename new-filename

The command line in changes the name of the file memo to memo.0130. The initial ls command shows that memo is the only file in the directory. After you give the mv command, memo.0130 is the only file in the directory. Compare this result to that of the earlier cp example.

Figure 5-3. mv renames a file
$ ls
memo
$ mv memo memo.0130
$ ls
memo.0130

The mv utility can be used for more than changing the name of a file. Refer to "" on page . See the mv info page for more information.

Caution: mv can destroy a file

Just as cp can destroy a file, so can mv. Also like cp, mv has a i (interactive) option. See the caution box labeled "cp can destroy a file."

lpr: Prints a File

The lpr (line printer) utility places one or more files in a print queue for printing. Linux provides print queues so that only one job is printed on a given printer at a time. A queue allows several people or jobs to send output simultaneously to a single printer with the expected results. On systems that have access to more than one printer, you can use lpstat p to display a list of available printers. Use the P option to instruct lpr to place the file in the queue for a specific printereven one that is connected to another system on the network. The following command prints the file named report:

$ lpr report

Because this command does not specify a printer, the output goes to the default printer, which is the printer when you have only one printer.

The next command line prints the same file on the printer named mailroom:

$ lpr -P mailroom report

You can see which jobs are in the print queue by giving an lpstat o command or by using the lpq utility:

$ lpq
lp is ready and printing
Rank  Owner   Job Files                 Total Size
active alex    86 (standard input)        954061 bytes

In this example, Alex has one job that is being printed; no other jobs are in the queue. You can use the job number (86 in this case) with the lprm utility to remove the job from the print queue and stop it from printing:

$ lprm 86

You can send more than one file to the printer with a single command. The following command line prints three files on the printer named laser1:

$ lpr -P laser1 05.txt 108.txt 12.txt

Refer to for information on setting up a printer and defining the default printer.

grep: Searches for a String

The grep utility searches through one or more files to see whether any contain a specified string of characters. This utility does not change the file it searches but simply displays each line that contains the string.

Originally the name grep was a play on an edan original UNIX editor, available on CentOS Linuxcommand: g/re/p. In this command g stands for global, re is a regular expression delimited by slashes, and p means print.

The grep command in searches through the file memo for lines that contain the string credit and displays a single line that meets this criterion. If memo contained such words as discredit, creditor, or accreditation, grep would have displayed those lines as well because they contain the string it was searching for. The w option causes grep to match only whole words. Although you do not need to enclose the string you are searching for in single quotation marks, doing so allows you to put SPACEs and special characters in the search string.

Figure 5-4. grep searches for a string
$ cat memo
Helen:
In our meeting on June 6 we
discussed the issue of credit.
Have you had any further thoughts
about it?
                       Alex
$ grep 'credit' memo
discussed the issue of credit.

The grep utility can do much more than search for a simple string in a single file. Refer to the grep info page and , "Regular Expressions," for more information.

head: Displays the Beginning of a File

By default the head utility displays the first ten lines of a file. You can use head to help you remember what a particular file contains. For example, if you have a file named months that lists the 12 months of the year in calendar order, one to a line, then head displays Jan through Oct ().

Figure 5-5. head displays the first ten lines of a file
$ cat months
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
$ head months
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct

This utility can display any number of lines, so you can use it to look at only the first line of a file, at a full screen, or even more. To specify the number of lines displayed, include a hyphen followed by the number of lines in the head command. For example, the following command displays only the first line of months:

$ head -1 months
Jan

The head utility can also display parts of a file based on a count of blocks or characters rather than lines. Refer to the head info page for more information.

tail: Displays the End of a File

The tail utility is similar to head but by default displays the last ten lines of a file. Depending on how you invoke it, this utility can display fewer or more than ten lines, use a count of blocks or characters rather than lines to display parts of a file, and display lines being added to a file that is changing. The following command causes tail to display the last five lines, Aug through Dec, of the months file shown in :

$ tail -5 months
Aug
Sep
Oct
Nov
Dec

You can monitor lines as they are added to the end of the growing file named logfile with the following command:

$ tail -f logfile

Press the interrupt key (usually CONTROL-C) to stop tail and display the shell prompt. Refer to the tail info page for more information.

sort: Displays a File in Order

The sort utility displays the contents of a file in order by lines but does not change the original file. For example, if a file named days contains the name of each day of the week in calendar order, each on a separate line, then sort displays the file in alphabetical order ().

Figure 5-6. sort displays the lines of a file in order
$ cat days
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
$ sort days
Friday
Monday
Saturday
Sunday
Thursday
Tuesday
Wednesday

The sort utility is useful for putting lists in order. The u option generates a sorted list in which each line is unique (no duplicates). The n option puts a list of numbers in order. Refer to the sort info page for more information.

uniq: Removes Duplicate Lines from a File

The uniq (unique) utility displays a file, skipping adjacent duplicate lines, but does not change the original file. If a file contains a list of names and has two successive entries for the same person, uniq skips the extra line ().

Figure 5-7. uniq removes duplicate lines
$ cat dups
Cathy
Fred
Joe
John
Mary
Mary
Paula
$ uniq dups
Cathy
Fred
Joe
John
Mary
Paula

If a file is sorted before it is processed by uniq, this utility ensures that no two lines in the file are the same. (Of course, sort can do that all by itself with the u option.) Refer to the uniq info page for more information.

diff: Compares Two Files

The diff (difference) utility compares two files and displays a list of the differences between them. This utility does not change either file, so it is useful when you want to compare two versions of a letter or a report or two versions of the source code for a program.

The diff utility with the u (unified output format) option first displays two lines indicating which of the files you are comparing will be denoted by a plus sign (+) and which by a minus sign (). In , a minus sign indicates the colors.1 file; a plus sign indicates the colors.2 file.

Figure 5-8. diff displaying the unified output format
$ diff -u colors.1 colors.2
--- colors.1      Fri Nov 25 15:45:32 2005
+++ colors.2      Fri Nov 25 15:24:46 2005
@@ -1,6 +1,5 @@
 red
+blue
 green
 yellow
-pink
-purple
 orange

The diff u command breaks long, multiline text into hunks. Each hunk is preceded by a line starting and ending with two at signs (@@). This hunk identifier indicates the starting line number and the number of lines from each file for this hunk. In , the hunk covers the section of the colors.1 file (indicated by a minus sign) from the first line through the sixth line. The +1,5 then indicates that the hunk covers colors.2 from the first line through the fifth line.

Following these header lines, diff u displays each line of text with a leading minus sign, a leading plus sign, or nothing. A leading minus sign indicates that the line occurs only in the file denoted by the minus sign. A leading plus sign indicates that the line comes from the file denoted by the plus sign. A line that begins with neither a plus sign nor a minus sign occurs in both files in the same location. Refer to the diff info page for more information.

file: Tests the Contents of a File

You can use the file utility to learn about the contents of any file on a Linux system without having to open and examine the file yourself. In the following example, file reports that letter_e.bz2 contains data that was compressed by the bzip2 utility (page ):

$ file letter_e.bz2
letter_e.bz2: bzip2 compressed data, block size = 900k

Next file reports on two more files:

$ file memo zach.jpg
memo:     ASCII text
zach.jpg: JPEG image data, ... resolution (DPI), 72 x 72

Refer to the file man page for more information.