Miscellaneous sort Hints

Here is a grab bag of useful, if not exactly interesting, sort features. The utility will actually do quite a bit, if you let it.

Dealing with Repeated Lines

sort -u sorts the file and eliminates duplicate lines. It's more powerful than uniq () because:

In return, there are a few things that uniq does that sort won't do - like print only those lines that aren't repeated, or count the number of times each line is repeated. But on the whole, I find sort -u more useful.

Here's one idea for using sort -u. When I was writing a manual, I often needed to make tables of error messages. The easiest way to do this was to grep the source code for printf statements; write some Emacs () macros to eliminate junk that I didn't care about; use sort -u to put the messages in order and get rid of duplicates; and write some more Emacs macros to format the error messages into a table. All I had to do was write the descriptions.

Ignoring Blanks

One important option (that I've mentioned a number of times) is -b; this tells sort to ignore extra white space at the beginning of each field. This is absolutely essential; otherwise, your sorts will have rather strange results. In my opinion, -b should be the default. But they didn't ask me.

Another thing to remember about -b: it only works if you explicitly specify which fields you want to sort. By itself, sort -b is the same as sort: white space characters are counted. I call this a bug, don't you?

Case-Insensitive Sorts

If you don't care about the difference between uppercase and lowercase letters, invoke sort with the -f (case-fold) option. This folds lowercase letters into uppercase. In other words, it treats all letters as uppercase.

Dictionary Order

The -d option tells sort to ignore all characters except for letters, digits, and white space. In particular, sort -d ignores punctuation.

Month Order

The -M option tells sort to treat the first three non-blank characters of a field as a three-letter month abbreviation, and to sort accordingly. That is, JAN comes before FEB, which comes before MAR. This option isn't available on all versions of UNIX.

Reverse Sort

The -r option tells sort to "reverse" the order of the sort; i.e., Z comes before A, 9 comes before 1, and so on. You'll find that this option is really useful. For example, imagine you have a program running in the background that records the number of free blocks in the filesystem at midnight each night. Your log file might look like this:

Jan 1 1992: 108 free blocks Jan 2 1992: 308 free blocks Jan 3 1992: 1232 free blocks Jan 4 1992: 76 free blocks ...

The script below finds the smallest and largest number of free blocks in your log file:

 head 
#!/bin/sh echo "Minimum free blocks" sort -t: +1nb logfile | head -1 echo "Maximum free blocks" sort -t: +1nbr logfile | head -1

It's not profound, but it's an example of what you can do.

- ML