Fluent Perl
We've touched on a few idioms in the preceding sections (not to mention the preceding chapters), but there are many other idioms you'll commonly see if you read programs by accomplished Perl developers. When we speak of idiomatic Perl in this context, we don't just mean a set of arbitrary Perl expressions with fossilized meanings. Rather, we mean Perl code that shows an understanding of the flow of the language, what you can get away with when, and what that buys you. And when to buy it.
We can't hope to list all the idioms you might see--that would take a tutorial as big as this one. Maybe two. (See the Perl tutorial, for instance.) But here are some of the important idioms, where "important" might be defined as "that which induces hissy fits in people who think they already know just how computer languages ought to work".
- Use
=>
in place of a comma anywhere you think it improves readability:
return bless $mess => $class;
This reads, "Bless this mess into the specified class." Just be careful not to use it after a word that you don't want autoquoted:
sub foo () { "FOO" } sub bar () { "BAR" } print foo => bar; # prints fooBAR, not FOOBAR;
Another good place to use=>
is near a literal comma that might get confused visually:
join(", " => @array);
Perl provides you with more than one way to do things so that you can exercise your ability to be creative. Exercise it! - Use the singular pronoun to increase readability:
for (@lines) { $_ .= "\n"; }
The$_
variable is Perl's version of a pronoun, and it essentially means "it". So the code above means "for each line, append a newline to it." Nowadays you might even spell that:
$_ .= "\n" for @lines;
The$_
pronoun is so important to Perl that its use is mandatory ingrep
andmap
. Here is one way to set up a cache of common results of an expensive function:
%cache = map { $_ => expensive($_) } @common_args; $xval = $cache{$x} || expensive($x);
- Omit the pronoun to increase readability even further.[1]
[1]In this section, multiple bullet items in a row all refer to the subsequent example, since some of our examples illustrate more than one idiom.
- Use loop controls with statement modifiers.
while (<>) { next if /^=for\s+(index|later)/; $chars += length; $words += split; $lines += y/\n//; }
This is a fragment of code we used to do page counts for this tutorial. When you're going to be doing a lot of work with the same variable, it's often more readable to leave out the pronouns entirely, contrary to common belief.The fragment also demonstrates the idiomatic use of
next
with a statement modifier to short-circuit a loop.The
$_
variable is always the loop control variable ingrep
andmap
, but the program's reference to it is often implicit:@haslen = grep { length } @random;
Here we take a list of random scalars and only pick the ones that have a length greater than . - Use
for
to set the antecedent for a pronoun:
for ($episode) { s/fred/barney/g; s/wilma/betty/g; s/pebbles/bambam/g; }
So what if there's only one element in the loop? It's a convenient way to set up "it", that is,$_
. Linguistically, this is known as topicalization. It's not cheating, it's communicating. - Implicitly reference the plural pronoun,
@_
. - Use control flow operators to set defaults:
sub bark { my Dog $spot = shift; my $quality = shift || "yapping"; my $quantity = shift || "nonstop"; ... }
Here we're implicitly using the other Perl pronoun,@_
, which means "them". The arguments to a function always come in as "them". Theshift
operator knows to operate on@_
if you omit it, just as the ride operator at Disneyland might call out "Next!" without specifying which queue is supposed to shift. (There's no point in specifying, because there's only one queue that matters.)The
||
can be used to set defaults despite its origins as a Boolean operator, since Perl returns the first true value. Perl developers often manifest a cavalier attitude toward the truth; the line above would break if, for instance, you tried to specify a quantity of 0. But as long as you never want to set either$quality
or$quantity
to a false value, the idiom works great. There's no point in getting all superstitious and throwing in calls todefined
andexists
all over the place. You just have to understand what it's doing. As long as it won't accidentally be false, you're fine. - Use assignment forms of operators, including control flow operators:
$xval = $cache{$x} ||= expensive($x);
Here we don't initialize our cache at all. We just rely on the||=
operator to callexpensive($x)
and assign it to$cache{$x}
only if$cache{$x}
is false. The result of that is whatever the new value of$cache{$x}
is. Again, we take the cavalier approach towards truth, in that if we cache a false value,expensive($x)
will get called again. Maybe the developer knows that's okay, becauseexpensive($x)
isn't expensive when it returns false. Or maybe the developer knows thatexpensive($x)
never returns a false value at all. Or maybe the developer is just being sloppy. Sloppiness can be construed as a form of creativity. - Use loop controls as operators, not just as statements. And...
- Use commas like small semicolons:
while (<>) { $comments++, next if /^#/; $blank++, next if /^\s*$/; last if /^__END__/; $code++; } print "comment = $comments\nblank = $blank\ncode = $code\n";
This shows an understanding that statement modifiers modify statements, whilenext
is a mere operator. It also shows the comma being idiomatically used to separate expressions much like you'd ordinarily use a semicolon. (The difference being that the comma keeps the two expressions as part of the same statement, under the control of the single statement modifier.) - Use flow control to your advantage:
while (<>) { /^#/ and $comments++, next; /^\s*$/ and $blank++, next; /^__END__/ and last; $code++; } print "comment = $comments\nblank = $blank\ncode = $code\n";
Here's the exact same loop again, only this time with the patterns out in front. The perspicacious Perl developer understands that it compiles down to exactly the same internal codes as the previous example. Theif
modifier is just a backwardand
(or&&
) conjunction, and theunless
modifier is just a backwardor
(or||
) conjunction. - Use the implicit loops provided by the -n and -p switches.
- Don't put semicolon at the end of a one-line block:
#!/usr/bin/perl -n $comments++, next LINE if /#/; $blank++, next LINE if /^\s*$/; last LINE if /^__END__/; $code++; END { print "comment = $comments\nblank = $blank\ncode = $code\n" }
This is essentially the same program as before. We put an explicitLINE
label on the loop control operators because we felt like it, but we didn't really need to, since the implicitLINE
loop supplied by-n
is the innermost enclosing loop. We used anEND
to get the final print statement outside the implicit main loop, just as in awk. - Use here docs when the printing gets ferocious.
- Use a meaningful delimiter on the here doc:
END { print <<"COUNTS" } comment = $comments blank = $blank code = $code COUNTS
Rather than using multiple prints, the fluent Perl developer uses a multiline string with interpolation. And despite our calling it a Common Goof earlier, we've brazenly left off the trailing semicolon because it's not necessary at the end of theEND
block. (If we ever turn it into a multiline block, we'll put the semicolon back in.) - Do substitutions and translations en passant on a scalar:
($new = $old) =~ s/bad/good/g;
Since lvalues are lvaluable, so to speak, you'll often see people changing a value "in passing" while it's being assigned. This could actually save a string copy internally (if we ever get around to implementing the optimization):
chomp($answer = <STDIN>);
Any function that modifies an argument in place can do the en passant trick. But wait, there's more! - Don't limit yourself to changing scalars en passant:
for (@new = @old) { s/bad/good/g }
Here we copy@old
into@new
, changing everything in passing (not all at once, of course--the block is executed repeatedly, one "it" at a time). - Pass named parameters using the fancy
=>
comma operator. - Rely on assignment to a hash to do even/odd argument processing:
sub bark { my DOG $spot = shift; my %parm = @_; my $quality = $parm{QUALITY} || "yapping"; my $quantity = $parm{QUANTITY} || "nonstop"; ... } $fido->bark( QUANTITY => "once", QUALITY => "woof" );
Named parameters are often an affordable luxury. And with Perl, you get them for free, if you don't count the cost of the hash assignment. - Repeat Boolean expressions until false.
- Use minimal matching when appropriate.
- Use the
/e
modifier to evaluate a replacement expression:
#!/usr/bin/perl -p 1 while s/^(.*?)(\t+)/$1 . ' ' x (length($2) * 4 - length($1) % 4)/e;
This program fixes any file you receive from someone who mistakenly thinks they can redefine hardware tabs to occupy 4 spaces instead of 8. It makes use of several important idioms. First, thewhile
idiom is handy when all the work you want to do in the loop is actually done by the conditional. (Perl is smart enough not to warn you that you're using in a void context.) We have to repeat this substitution because each time we substitute some number of spaces in for tabs, we have to recalculate the column position of the next tab from the beginning.The
(.*?)
matches the smallest string it can up until the first tab, using the minimal matching modifier (the question mark). In this case, we could have used an ordinary greedy*
like this:([^\t]*)
. But that only works because a tab is a single character, so we can use a negated character class to avoid running past the first tab. In general, the minimal matcher is much more elegant, and doesn't break if the next thing that must match happens to be longer than one character.The
/e
modifier does a substitution using an expression rather than a mere string. This lets us do the calculations we need right when we need them. - Use creative formatting and comments on complex substitutions:
#!/usr/bin/perl -p 1 while s{ ^ # anchor to beginning ( # start first subgroup .*? # match minimal number of characters ) # end first subgroup ( # start second subgroup \t+ # match one or more tabs ) # end second subgroup } { my $spacelen = length($2) * 4; # account for full tabs $spacelen -= length($1) % 4; # account for the uneven tab $1 . ' ' x $spacelen; # make correct number of spaces }ex;
This is probably overkill, but some people find it more impressive than the previous one-liner. Go figure. - Go ahead and use
$`
if you feel like it:
while s/(\t+)/' ' x (length($1) * 4 - length($`) % 4)/e;
Here's the shorter version, which uses$`
, which is known to impact performance. Except that we're only using the length of it, so it doesn't really count as bad. - Use the offsets directly from the
@-
(@LAST_MATCH_START
) and@+
(@LAST_MATCH_END
) arrays:
while s/\t+/' ' x (($+[0] - $-[0]) * 4 - $-[0] % 4)/e;
This one's even shorter. (If you don't see any arrays there, try looking for array elements instead.) See@-
and@+
in "Special Names". - Use
eval
with a constant return value:
sub is_valid_pattern { my $pat = shift; return eval { "" =~ /$pat/; 1 } || 0; }
You don't have to use theeval {}
operator to return a real value. Here we always return if it gets to the end. However, if the pattern contained in$pat
blows up, theeval
catches it and returnsundef
to the Boolean conditional of the||
operator, which turns it into a defined (just to be polite, sinceundef
is also false but might lead someone to believe that theis_valid_pattern
subroutine is misbehaving, and we wouldn't want that, now would we?). - Use modules to do all the dirty work.
- Use object factories.
- Use callbacks.
- Use stacks to keep track of context.
- Use negative subscripts to access the end of an array or string:
use XML::Parser; $p = new XML::Parser Style => 'subs'; setHandlers $p Char => sub { $out[-1] .= $_[1] }; push @out, ""; sub literal { $out[-1] .= "C<"; push @out, ""; } sub literal_ { my $text = pop @out; $out[-1] .= $text . ">"; } ...
This is a snippet from the 250-line program we used to translate the XML version of the old Camel tutorial back into pod format so we could edit it for this version with a Real Text Editor.The first thing you'll notice is that we rely on the
XML::Parser
module (from CPAN) to parse our XML correctly, so we don't have to figure out how. That cuts a few thousand lines out of our program right there (presuming we're reimplementing in Perl everythingXML::Parser
does for us,[2] including translation from almost any character set into UTF-8).[2]Actually,
XML::Parser
is just a fancy wrapper around James Clark's expat XML parser.XML::Parser
uses a high-level idiom called an object factory. In this case, it's a parser factory. When we create anXML::Parser
object, we tell it which style of parser interface we want, and it creates one for us. This is an excellent way to build a testbed application when you're not sure which kind of interface will turn out to be the best in the long run. Thesubs
style is just one ofXML::Parser
's interfaces. In fact, it's one of the oldest interfaces, and probably not even the most popular one these days.The
setHandlers
line shows a method call on the parser, not in arrow notation, but in "indirect object" notation, which lets you omit the parens on the arguments, among other things. The line also uses the named parameter idiom we saw earlier.The line also shows another powerful concept, the notion of a callback. Instead of us calling the parser to get the next item, we tell it to call us. For named XML tags like
<literal>
, this interface style will automatically call a subroutine of that name (or the name with an underline on the end for the corresponding end tag). But the data between tags doesn't have a name, so we set up aChar
callback with thesetHandlers
method.Next we initialize the
@out
array, which is a stack of outputs. We put a null string into it to represent that we haven't collected any text at the current tag embedding level (0 initially).Now is when that callback comes back in. Whenever we see text, it automatically gets appended to the final element of the array, via the
$out[-1]
idiom in the callback. At the outer tag level,$out[-1]
is the same as$out[0]
, so$out[0]
ends up with our whole output. (Eventually. But first we have to deal with tags.)Suppose we see a
<literal>
tag. Then theliteral
subroutine gets called, appends some text to the current output, then pushes a new context onto the@out
stack. Now any text up until the closing tag gets appended to that new end of the stack. When we hit the closing tag, we pop the$text
we've collected back off the@out
stack, and append the rest of the transmogrified data to the new (that is, the old) end of stack, the result of which is to translate the XML string,<literal>
text</literal>
, into the corresponding pod string,C<
text>
.The subroutines for the other tags are just the same, only different.
- Use
my
without assignment to create an empty array or hash. - Split the default string on whitespace.
- Assign to lists of variables to collect however many you want.
- Use autovivification of undefined references to create them.
- Autoincrement undefined array and hash elements to create them.
- Use autoincrement of a
%seen
array to determine uniqueness. - Assign to a handy
my
temporary in the conditional. - Use the autoquoting behavior of braces.
- Use an alternate quoting mechanism to interpolate double quotes.
- Use the
?:
operator to switch between two arguments to aprintf
. - Line up
printf
args with their%
field:
my %seen; while (<>) { my ($a, $b, $c, $d) = split; print unless $seen{$a}{$b}{$c}{$d}++; } if (my $tmp = $seen{fee}{fie}{foe}{foo}) { printf qq(Saw "fee fie foe foo" [sic] %d time%s.\n"), $tmp, $tmp == 1 ? "" : "s"; }
These nine lines are just chock full of idioms. The first line makes an empty hash because we don't assign anything to it. We iterate over input lines setting "it", that is,$_
, implicitly, then using an argumentlesssplit
which splits "it" on whitespace. Then we pick off the four first words with a list assignment, throwing any subsequent words away. Then we remember the first four words in a four-dimensional hash, which automatically creates (if necessary) the first three reference elements and final count element for the autoincrement to increment. (Underuse warnings
, the autoincrement will never warn that you're using undefined values, because autoincrement is an accepted way to define undefined values.) We then print out the line if we've never seen a line starting with these four words before, because the autoincrement is a postincrement, which, in addition to incrementing the hash value, will return the old true value if there was one.After the loop, we test
%seen
again to see if a particular combination of four words was seen. We make use of the fact that we can put a literal identifier into braces and it will be autoquoted. Otherwise, we'd have to say$seen{"fee"}{"fie"}{"foe"}{"foo"}
, which is a drag even when you're not running from a giant.We assign the result of
$seen{fee}{fie}{foe}{foo}
to a temporary variable even before testing it in the Boolean context provided by theif
. Because assignment returns its left value, we can still test the value to see if it was true. Themy
tells your eye that it's a new variable, and we're not testing for equality but doing an assignment. It would also work fine without themy
, and an expert Perl developer would still immediately notice that we used one=
instead of two==
. (A semiskilled Perl developer might be fooled, however. Pascal developers of any skill level will foam at the mouth.)Moving on to the
printf
statement, you can see theqq()
form of double quotes we used so that we could interpolate ordinary double quotes as well as a newline. We could've directly interpolated$tmp
there as well, since it's effectively a double-quoted string, but we chose to do further interpolation viaprintf
. Our temporary$tmp
variable is now quite handy, particularly since we don't just want to interpolate it, but also test it in the conditional of a?:
operator to see whether we should pluralize the word "time". Finally, note that we lined up the two fields with their corresponding%
markers in theprintf
format. If an argument is too long to fit, you can always go to the next line for the next argument, though we didn't have to in this case.
Whew! Had enough? There are many more idioms we could discuss, but this tutorial is already sufficiently heavy. But we'd like to talk about one more idiomatic use of Perl, the writing of program generators.