Common Practices
Contents:
Common Goofs for Novices
Efficiency
Developing with Style
Fluent Perl
Program Generation
Ask almost any Perl developer, and they'll be glad to give you reams of advice on how to program. We're no different (in case you hadn't noticed). Now, rather than trying to tell you about specific features of Perl, we'll go at it from the other direction and use a more scattergun approach to describe idiomatic Perl. Our hope is that, by putting together various bits of things that seemingly aren't related, you can soak up some of the feeling of what it's like to actually "think Perl". After all, when you're developing, you don't write a bunch of expressions, then a bunch of subroutines, then a bunch of objects. You have to go at everything all at once, more or less. So this chapter is a bit like that.
There is, however, a rudimentary organization to the chapter, in that we'll start with the negative advice and work our way towards the positive advice. We don't know if that will make you feel any better, but it makes us feel better.
Common Goofs for Novices
The biggest goof of all is forgetting to use warnings, which identifies many errors. The second biggest goof is forgetting to use strict when it's appropriate. These two pragmas can save you hours of head-banging when your program starts getting bigger. (And it will.) Yet another faux pas is to forget to consult the online FAQ. Suppose you want to find out if Perl has a round function. You might try searching the FAQ first:
Apart from those "metagoofs", there are several kinds of developing traps. Some traps almost everyone falls into, and other traps you'll fall into only if you come from a particular culture that does things differently. We've separated these out in the following sections.%perlfaq round
Universal Blunders
- Putting a comma after the filehandle in a
printstatement. Although it looks extremely regular and pretty to say:
this is nonetheless incorrect, because of that first comma. What you want instead is the indirect object syntax:print STDOUT, "goodbye", $adj, "world!\n"; # WRONG
The syntax works this way so that you can say:print STDOUT "goodbye", $adj, "world!\n"; # ok
whereprint $filehandle "goodbye", $adj, "world!\n";
$filehandleis a scalar holding the name of a filehandle at run time. This is distinct from:
whereprint $notafilehandle, "goodbye", $adj, "world!\n";
$notafilehandleis simply a string that is part of the list of things to be printed. See "indirect object" in the Glossary. - Using
==instead ofeqand!=instead ofne. The==and!=operators are numeric tests. The other two are string tests. The strings"123"and"123.00"are equal as numbers, but not equal as strings. Also, any nonnumeric string is numerically equal to zero. Unless you are dealing with numbers, you almost always want the string comparison operators instead. - Forgetting the trailing semicolon. Every statement in Perl is terminated by a semicolon or the end of a block. Newlines aren't statement terminators as they are in awk, Python, or FORTRAN. Remember that Perl is like C.
A statement containing a here document is particularly prone to losing its semicolon. It ought to look like this:
print <<'FINIS'; A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. --Ralph Waldo Emerson FINIS
- Forgetting that a BLOCK requires braces. Naked statements are not BLOCKs. If you are creating a control structure such as a
whileor anifthat requires one or more BLOCKs, you must use braces around each BLOCK. Remember that Perl is not like C. - Not saving
$1,$2, and so on, across regular expressions. Remember that every newm/atch/ors/ubsti/tution/will set (or clear, or mangle) your$1,$2...variables, as well as$`,$&, and$'. One way to save them right away is to evaluate the match within a list context, as in:my ($one, $two) = /(\w+) (\w+)/;
- Not realizing that a
localalso changes the variable's value as seen by other subroutines called within the scope of the local. It's easy to forget thatlocalis a run-time statement that does dynamic scoping, because there's no equivalent in languages like C. See the section "Scoped Declarations" in "Statements and Declarations". Usually you want amyanyway. - Losing track of brace pairings. A good text editor will help you find the pairs. Get one. (Or two.)
- Using loop control statements in
do {} while. Although the braces in this control structure look suspiciously like part of a loop BLOCK, they aren't. - Saying
@foo[1]when you mean$foo[1]. The@foo[1]reference is an array slice, meaning an array consisting of the single element$foo[1]. Sometimes this doesn't make any difference, as in:
but it makes a big difference for things like:print "the answer is @foo[1]\n";
which will slurp up all the rest of@foo[1] = <STDIN>;
STDIN, assign the first line to$foo[1], and discard everything else. This is probably not what you intended. Get into the habit of thinking that$means a single value, while@means a list of values, and you'll do okay. - Forgetting the parentheses of a list operator like
my:my $x, $y = (4, 8); # WRONG my ($x, $y) = (4, 8); # ok
- Forgetting to select the right filehandle before setting
$^,$~, or$|. These variables depend on the currently selected filehandle, as determined byselect(FILEHANDLE). The initial filehandle so selected isSTDOUT. You should really be using the filehandle methods from theFileHandlemodule instead. See "Special Names".
Frequently Ignored Advice
Practicing Perl Developers should take note of the following:
- Remember that many operations behave differently in a list context than they do in a scalar one. For instance:
($x) = (4, 5, 6); # List context; $x is set to 4 $x = (4, 5, 6); # Scalar context; $x is set to 6 @a = (4, 5, 6); $x = @a; # Scalar context; $x is set to 3 (the array list)
- Avoid barewords if you can, especially all lowercase ones. You can't tell just by looking at it whether a word is a function or a bareword string. By using quotes on strings and parentheses around function call arguments, you won't ever get them confused. In fact, the pragma
use strictat the beginning of your program makes barewords a compile-time error--probably a good thing. - You can't tell just by looking which built-in functions are unary operators (like
chopandchdir), which are list operators (likeprintandunlink), and which are argumentless (liketime). You'll want to learn them by reading "Functions". As always, use parentheses if you aren't sure--or even if you aren't sure you're sure. Note also that user-defined subroutines are by default list operators, but they can be declared as unary operators with a prototype of($)or argumentless with a prototype of(). - People have a hard time remembering that some functions default to
$_, or@ARGV, or whatever, while others do not. Take the time to learn which are which, or avoid default arguments. <FH>is not the name of a filehandle, but an angle operator that does a line-input operation on the handle. This confusion usually manifests itself when people try toprintto the angle operator:print <FH> "hi"; # WRONG, omit angles
- Remember also that data read by the angle operator is assigned to
$_only when the file read is the sole condition in awhileloop:while (<FH>) { } # Data assigned to $_. <FH>; # Data read and discarded! - Don't use
=when you need=~; the two constructs are quite different:$x = /foo/; # Searches $_ for "foo", puts result in $x $x =~ /foo/; # Searches $x for "foo", discards result
- Use
myfor local variables whenever you can get away with it. Usinglocalmerely gives a temporary value to a global variable, which leaves you open to unforeseen side effects of dynamic scoping. - Don't use
localon a module's exported variables. If you localize an exported variable, its exported value will not change. The local name becomes an alias to a new value but the external name is still an alias for the original.
C Traps
Cerebral C developers should take note of the following:
- Curlies are required for
ifandwhileblocks. - You must use
elsifrather than "else if" or "elif". Syntax like this:
is illegal. Theif (expression) { block; } else if (another_expression) { # WRONG another_block; }elsepart is always a block, and a nakedifis not a block. You mustn't expect Perl to be exactly the same as C. What you want instead is:
Note also that "elif" is "file" spelled backward. Only Algol-ers would want a keyword that was the same as another word spelled backward.if (expression) { block; } elsif (another_expression) { another_block; } - The
breakandcontinuekeywords from C become in Perllastandnext, respectively. Unlike in C, these do not work within ado {} whileconstruct. - There's no switch statement. (But it's easy to build one on the fly; see "Bare Blocks" and "Case Structures" in "Statements and Declarations".)
- Variables begin with
$,@, or%in Perl. - Comments begin with
#, not/*. - You can't take the address of anything, although a similar operator in Perl is the backslash, which creates a reference.
ARGVmust be capitalized.$ARGV[0]is C'sargv[1], and C'sargv[0]ends up in$0.- Syscalls such as
link,unlink, andrenamereturn true for success, not . - The signal handlers in
%SIGdeal with signal names, not numbers.
Shell Traps
Sharp shell developers should take note of the following:
- Variables are prefixed with
$,@, or%on the left side of the assignment as well as the right. A shellish assignment like:
won't be parsed the way you expect. You need:camel="dromedary"; # WRONG
$camel="dromedary"; # ok
- The loop variable of a
foreachalso requires a$. Although csh likes:
in Perl, this is written as:foreach hump (one two) stuff_it $hump end
foreach $hump ("one", "two") { stuff_it($hump); } - The backtick operator does variable interpolation without regard to the presence of single quotes in the command.
- The backtick operator does no translation of the return value. In Perl, you have to trim the newline explicitly, like this:
chomp($thishost = `hostname`);
- Shells (especially csh) do several levels of substitution on each command line. Perl does interpolation only within certain constructs such as double quotes, backticks, angle brackets, and search patterns.
- Shells tend to interpret scripts a little bit at a time. Perl compiles the entire program before executing it (except for
BEGINblocks, which execute before the compilation is done). - Program arguments are available via
@ARGV, not$1,$2, and so on. - The environment is not automatically made available as individual scalar variables. Use the standard
Envmodule if you want that to happen.
Previous Perl Traps
Penitent Perl 4 (and Prior) Developers should take note of the following changes between release 4 and release 5 that might affect old scripts:
@now always interpolates an array in double-quotish strings. Some programs may now need to use backslashes to protect any@that shouldn't interpolate.- Barewords that used to look like strings to Perl will now look like subroutine calls if a subroutine by that name is defined before the compiler sees them. For example:
In prior versions of Perl, that code would set the signal handler. Now, it actually calls the function! You may use the -w switch to find such risky usage orsub SeeYa { die "Hasta la vista, baby!" } $SIG{'QUIT'} = SeeYa;use strictto outlaw it. - Identifiers starting with "
_" are no longer forced into packagemain, except for the bare underscore itself (as in$_,@_, and so on). - A double colon is now a valid package separator in an identifier. Thus, the statement:
now parsesprint "$a::$b::$c\n";
$a::as the variable reference, where in prior versions only the$awas considered to be the variable reference. Similarly:
is now interpreted as a single variableprint "$var::abc::xyz\n";
$var::abc::xyz, whereas in prior versions, the variable$varwould have been followed by the constant text::abc::xyz. s'$pattern'replacement'now performs no interpolation on$pattern. (The$would be interpreted as an end-of-line assertion.) This behavior occurs only when using single quotes as the substitution delimiter; in other substitutions,$patternis always interpolated.- The second and third arguments of
spliceare now evaluated in scalar context rather than in list context. - These are now semantic errors because of precedence:
Because if those were to work, then this couldn't:shift @list + 20; # Now parses like shift(@list + 20), illegal! $n = keys %map + 20; # Now parses like keys(%map + 20), illegal!
sleep $dormancy + 20;
- The precedence of assignment operators is now the same as the precedence of assignment. Previous versions of Perl mistakenly gave them the precedence of the associated operator. So you now must parenthesize them in expressions like:
Otherwise:/foo/ ? ($a += 2) : ($a -= 2);
would be erroneously parsed as:/foo/ ? $a += 2 : $a -= 2;
On the other hand:(/foo/ ? $a += 2 : $a) -= 2;
now works as a C developer would expect.$a += /foo/ ? 1 : 2;
open FOO || dieis incorrect. You need parentheses around the filehandle, becauseopenhas the precedence of a list operator.- The elements of argument lists for formats are now evaluated in list context. This means you can interpolate list values now.
- You can't do a
gotointo a block that is optimized away. Darn. - It is no longer legal to use whitespace as the name of a variable or as a delimiter for any kind of quote construct. Double darn.
- The
callerfunction now returns a false value in scalar context if there is no caller. This lets modules determine whether they're being required or run directly. m//gnow attaches its state to the searched string rather than the regular expression. See "Pattern Matching", for further details.reverseis no longer allowed as the name of asortsubroutine.- taintperl is no longer a separate executable. There is now a -T switch to turn on tainting when it isn't turned on automatically.
- Double-quoted strings may no longer end with an unescaped
$or@. - The archaic
ifBLOCK BLOCK syntax is no longer supported. - Negative array subscripts now count from the end of the array.
- The comma operator in a scalar context is now guaranteed to give a scalar context to its arguments.
- The
**operator now binds more tightly than unary minus. - Setting
$#arraylower now discards array elements immediately. deleteis not guaranteed to return the deleted value fortied arrays, since this capability may be onerous for some modules to implement.- The construct
"this is $$x", which used to interpolate the process ID at that point, now tries to dereference$x.$$by itself still works fine, however. - The behavior of
foreachwhen it iterates over a list that is not an array has changed slightly. It used to assign the list to a temporary array but now, for efficiency, no longer does so. This means that you'll now be iterating over the actual values, not copies of the values. Modifications to the loop variable can change the original values, even after thegrep! For instance:
To retain prior Perl semantics, you'd need to explicitly assign your list to a temporary array and then iterate over that. For example, you might need to change:%
perl4 -e '@a = (1,2,3); for (grep(/./, @a)) { $_++ }; print "@a\n"'1 2 3 %perl5 -e '@a = (1,2,3); for (grep(/./, @a)) { $_++ }; print "@a\n"'2 3 4
to:foreach $var (grep /x/, @list) { ... }
Otherwise changingforeach $var (my @tmp = grep /x/, @list) { ... }$varwill clobber the values of@list. (This most often happens when you use$_for the loop variable and call subroutines in the loop that don't properly localize$_.) - Some error messages and warnings will be different.
- Some bugs may have been inadvertently removed.