Variable-Length ( Text) Databases

Many UNIX system databases (and quite a few user-created databases) are a series of human-readable text lines, with one record per line. For example, the password file consists of one line per user on the system, and the hosts file contains one line per hostname.

Most often, these databases are updated with simple text editors. Updating such a database consists of reading it all into a temporary area (either memory or another disk file), making the necessary changes, and then either writing the result back to the original file or creating a new file with the same name after deleting or renaming the old version. You can think of this as a copy pass: the data is copied from the original database to a new version of the database, making changes during the copy.

Perl supports a copy-pass-style edit on line-oriented databases using inplace editing. Inplace editing is a modification of the way the diamond operator (<>) reads data from the list of files specified on the command line. Most often, this editing mode is accessed by setting the -i command-line argument, but we can also trigger the inplace editing mode from within a program, as shown in the examples that follow.

To trigger the inplace editing mode, set a value into the $^I scalar variable. The value of this variable is important and will be discussed in a moment.

When the <> construct is used and $^I has a value other than undef, the steps marked ##INPLACE## in the following code are added to the list of implicit actions the diamond operator takes:

$ARGV = shift @ARGV; open(ARGV,"<$ARGV"); rename($ARGV,"$ARGV$^I"); ## INPLACE ## unlink($ARGV); ## INPLACE ## open(ARGVOUT,">$ARGV"); ## INPLACE ## select(ARGVOUT); ## INPLACE ##

The effect is that reads from the diamond operator come from the old file, and writes to the default filehandle go to a new copy of the file. The old file remains in a backup file, which is the filename with a suffix equal to the value of the $^I variable. (There's also a bit of magic to copy the permission bits from the old file to the new file.) These steps are repeated each time a new file is taken from the @ARGV array.

Typical values for $^I are things like bak or ~, to create backup files much like the editor creates. A strange and useful value for $^I is the empty string, "", which causes the old file to be neatly eliminated after the edit is complete. Unfortunately, if the system or program crashes during the execution of your program, you lose all of your old data, so this is recommended only for brave, foolish, or trusting souls.

Here's a way to change everyone's login shell to /bin/sh by editing the password file:

@ARGV = ("/etc/passwd"); # prime the diamond operator $^I = ".bak"; # write /etc/passwd.bak for safety while (<>) {
 # main loop, once for each line of /etc/passwd s#:[^:]*$#:/bin/sh#; # change the shell to /bin/sh print; # send output to ARGVOUT: the new /etc/passwd }

As you can see, this program is pretty simple. In fact, the same program can be generated entirely with a few command-line arguments, as in:

perl -p -i.bak -e 's#:[^:]*$#:/bin/sh#' /etc/passwd

The -p switch brackets your program with a while loop that includes a print statement. The -i switch sets a value into the $^I variable. The -e switch defines the following argument as a piece of Perl code for the loop body, and the final argument gives an initial value to @ARGV.

Command-line arguments are discussed in greater detail in Perl Developing and the perlrun manpage.