Alternation
Inside a pattern or subpattern, use the |
metacharacter to specify a set of possibilities, any one of which could match. For instance:
matches/Gandalf|Saruman|Radagast/
Gandalf
or Saruman
or Radagast
. The alternation extends only as far as the innermost enclosing parentheses (whether capturing or not):
The second and third forms match the same strings, but the second form captures the variant character in/prob|n|r|l|ate/ # Match prob, n, r, l, or ate /pro(b|n|r|l)ate/ # Match probate, pronate, prorate, or prolate /pro(?:b|n|r|l)ate/ # Match probate, pronate, prorate, or prolate
$1
and the third form does not.
At any given position, the Engine tries to match the first alternative, and then the second, and so on. The relative length of the alternatives does not matter, which means that in this pattern:
/(Sam|Samwise)/
$1
will never be set to Samwise
no matter what string it's matched against, because Sam
will always match first. When you have overlapping matches like this, put the longer ones at the beginning.
But the ordering of the alternatives only matters at a given position. The outer loop of the Engine does left-to-right matching, so the following always matches the first Sam
:
But you can force right-to-left scanning by making use of greedy quantifiers, as discussed earlier in "Quantifiers":"'Sam I am,' said Samwise" =~ /(Samwise|Sam)/; # $1 eq "Sam"
You can defeat any left-to-right (or right-to-left) matching by including any of the various positional assertions we saw earlier, such as"'Sam I am,' said Samwise" =~ /.*(Samwise|Sam)/; # $1 eq "Samwise"
G
, ^
, and $
. Here we anchor the pattern to the end of the string:
That example factors the"'Sam I am,' said Samwise" =~ /(Samwise|Sam)$/; # $1 eq "Samwise"
$
out of the alternation (since we already had a handy pair of parentheses to put it after), but in the absence of parentheses you can also distribute the assertions to any or all of the individual alternatives, depending on how you want them to match. This little program displays lines that begin with either a __DATA__
or __END__
token:
But be careful with that. Remember that the first and last alternatives (before the first#!/usr/bin/perl while (<>) { print if /^__DATA__|^__END__/; }
|
and after the last one) tend to gobble up the other elements of the regular expression on either side, out to the ends of the expression, unless there are enclosing parentheses. A common mistake is to ask for:
when you really mean:/^cat|dog|cow$/
The first matches "/^(cat|dog|cow)$/
cat
" at the beginning of the string, or "dog
" anywhere, or "cow
" at the end of the string. The second matches any string consisting solely of "cat
" or "dog
" or "cow
". It also captures $1
, which you may not want. You can also say:
We'll show you another solution later./^cat$|^dog$|^cow$/
An alternative can be empty, in which case it always matches.
This is much like using the/com(pound|)/; # Matches "compound" or "com" /com(pound(s|)|)/; # Matches "compounds", "compound", or "com"
?
quantifier, which matches 0 times or 1 time:
There is one difference, though. When you apply the/com(pound)?/; # Matches "compound" or "com" /com(pound(s?))?/; # Matches "compounds", "compound", or "com" /com(pounds?)?/; # Same, but doesn't use $2
?
to a subpattern that captures into a numbered variable, that variable will be undefined if there's no string to go there. If you used an empty alternative, it would still be false, but would be a defined null string instead.