Straightening Jagged Columns
As we were writing this tutorial, I decided to make a list of all the articles, the numbers of lines and characters in each - then combine that with the description, a status code, and the article's title. After a few minutes with wc -l -c (), cut (), sort (), and join (), I had a file that looked like this:
%cat messfile
2850 2095 51441 ~BB A sed tutorial 3120 868 21259 +BB mail - lots of basics 6480 732 31034 + How to find sources - JIK's periodic posting lines... 5630 14 453 +JP Running Commands on Directory Stacks 1600 12 420 !JP With find, Don't Forget -print 0495 9 399 + Make 'xargs -i' use more than one filename
Yuck. It was tough to read. The columns needed to be straightened. A little awk () script turned the mess into this:
%cat cleanfile
2850 2095 51441 ~BB A sed tutorial 3120 868 21259 +BB mail - lots of basics 6480 732 31034 + How to find sources - JIK's periodic posting lines... 5630 14 453 +JP Running Commands on Directory Stacks 1600 12 420 !JP With find, Don't Forget -print 0495 9 399 + Make 'xargs -i' use more than one filename
Here's the simple script I used and the command I typed to run it:
%cat neatcols
{ printf "%4s %4s %6s %-4s %s\n", \ $1, $2, $3, $4, substr($0, index($0,$5)) } %awk -f neatcols messfile > cleanfile
You can adapt that script for whatever kinds of columns you need to clean up. In case you don't know awk, here's a quick summary:
- The first line of the printf, between double quotes (
"
), tells the field widths and alignments. For example, the first column should be right-aligned in 4 characters (%4s
). The fourth column should be 4 characters wide left-adjusted (%-4s
). The fifth column is big enough to just fit (%s
). I used string (%s
) instead of decimal (%d
) so awk wouldn't strip off the leading zeros in the columns. - The second line arranges the input data fields onto the output line. Here, input and output are in the same order, but I could have reordered them. The first four columns get the first four fields (
$1, $2, $3, $4
).The fifth column is a catch-all; it gets everything else.
substr($0, index($0,$5))
means "find the fifth input column; print it and everything after it."
- JP