Regular Expressions
Each character matches itself, unless it is one of the special characters + ? . * ^ $ ( ) [ ] { } | \. The special meaning of these characters can be escaped using a \.
- .
- Matches an arbitrary character, but not a newline unless it is a single-line match (see
m
/ /s
). - (...)
- Groups a series of pattern elements to a single element.
- ^
- Matches the beginning of the target. In multiline mode (see
m
//m
) also matches after every newline character. - $
- Matches the end of the line. In multiline mode also matches before every newline character.
- [...]
- Denotes a class of characters to match. [^\*(EL] negates the class.
- (... | ... | ...)
- Matches one of the alternatives.
- (?# text)
- Comment.
- (?: regexp)
- Like (regexp) but does not make back-references.
- (?= regexp)
- Zero width positive look-ahead assertion.
- (?! regexp)
- Zero width negative look-ahead assertion.
- (? modifier)
- Embedded pattern-match modifier. modifier can be one or more of
i
,m
,s
, orx
.
Quantified subpatterns match as many times as possible. When followed with a ? they match the minimum number of times. These are the quantifiers:
- +
- Matches the preceding pattern element one or more times.
- ?
- Matches zero or one times.
- *
- Matches zero or more times.
- {n,m}
- Denotes the minimum n and maximum m match count. {n} means exactly n times; {n,} means at least n times.
A \ escapes any special meaning of the following character if non-alphanumeric, but it turns most alphanumeric characters into something special:
- \w
- Matches alphanumeric, including _, \W matches non-alphanumeric.
- \s
- Matches whitespace, \S matches non-whitespace.
- \d
- Matches numeric, \D matches non-numeric.
- \A
- Matches the beginning of the string, \Z matches the end.
- \b
- Matches word boundaries, \B matches non-boundaries.
- \G
- Matches where the previous
m
/ /g
search left off. - \n, \r, \f, \t, etc.
- Have their usual meaning.
- \w, \s, and \d
- May be used within character classes, \b denotes a backspace in this context.
Back-references:
- \1...\9
- Refer to matched subexpressions, grouped with ( ), inside the match.
- \10 and up
- Can also be used if the pattern matches that many subexpressions.
See also $1...$9, $+, $&, $`
, and $'
in the section called "Special Variables".
With modifier x
, whitespace can be used in the patterns for readability purposes.