Pattern Matching Quick Reference with Examples
Article gives a tutorial introduction to regular expressions. This article is intended for those of you who just need a quick listing of regular expression syntax as a refresher from time to time. It also includes some simple examples. The characters in Table 26.6 have special meaning only in search patterns.
| Pattern | What Does it Match? |
|---|---|
| Match any single character except newline. | |
| * | Match any number (or none) of the single characters that immediately precede it. The preceding character can also be a regular expression. For example, since (dot) means any character,* means "match any number of any character."
|
| ^ | Match the following regular expression at the beginning of the line. |
| $ | Match the preceding regular expression at the end of the line. |
| [ ] | Match any one of the enclosed characters. |
A hyphen (-) indicates a range of consecutive characters. A caret (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or a right square bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list.
| |
{n,m\}
| Match a range of occurrences of the single character that immediately precedes it. The preceding character can also be a regular expression. \{n\} will match exactly n occurrences; \{n,\} will match at least n occurrences; and \{n,m\} will match any number of occurrences between n and m.
|
| Turn off the special meaning of the character that follows. | |
| ( \) | Save the pattern enclosed between \( and \) into a special holding space. Up to nine patterns can be saved on a single line. They can be "replayed" in substitutions by the escape sequences \1 to \9. |
| < \> | Match characters at beginning (<) or end (>) of a word.
|
| + | Match one or more instances of preceding regular expression. |
| ? | Match zero or one instances of preceding regular expression. |
| | | Match the regular expression specified before or after. |
| ( ) | Apply a match to the enclosed group of regular expressions. |
The characters in Table 26.7 have special meaning only in replacement patterns.
| Pattern | What Does it Match? |
|---|---|
| Turn off the special meaning of the character that follows. | |
n
| Restore the nth pattern previously saved by ( and ).n is a number from 1 to 9, with 1 starting on the left.
|
| & | Re-use the search pattern as part of the replacement pattern. |
| ~ | Re-use the previous replacement pattern in the current replacement pattern. |
| u | Convert first character of replacement pattern to uppercase. |
| U | Convert replacement pattern to uppercase. |
| l | Convert first character of replacement pattern to lowercase. |
| L | Convert replacement pattern to lowercase. |
Examples of Searching
When used with grep or egrep, regular expressions are surrounded by quotes. (If the pattern contains a $, you must use single quotes; e.g., 'pattern'.) When used with ed, ex, sed, and awk, regular expressions are usually surrounded by / (although any delimiter works). Table 26.8 has some example patterns.
| Pattern | What Does it Match? |
|---|---|
| bag | The string bag.
|
| ^bag | bag at beginning of line.
|
| bag$ | bag at end of line.
|
| ^bag$ | bag as the only word on line.
|
| [Bb]ag | Bag or bag.
|
| b[aeiou]g | Second letter is a vowel. |
| b[^aeiou]g | Second letter is a consonant (or uppercase or symbol). |
| g | Second letter is any character. |
| ^...$ | Any line containing exactly three characters. |
| ^\. | Any line that begins with a (dot). |
| ^\.[a-z][a-z] | Same, followed by two lowercase letters (e.g., troff requests). |
| ^\.[a-z]\{2\} | Same as previous, grep or sed only. |
| ^[^.] | Any line that doesn't begin with a (dot). |
| bugs* | bug, bugs, bugss, etc.
|
| "word" | word in quotes. |
| "*word"* | word, with or without quotes. |
| [A-Z][A-Z]* | One or more uppercase letters. |
| [A-Z]+ | Same, egrep or awk only. |
| [A-Z].* | An uppercase letter, followed by zero or more characters. |
| [A-Z]* | Zero or more uppercase letters. |
| [a-zA-Z] | Any letter. |
| [^0-9A-Za-z] | Any symbol (not a letter or a number). |
| [567] | One of the numbers , , or . |
| egrep or awk pattern: | |
| five|six|seven | One of the words five, six, or seven.
|
| [23]?86 | One of the numbers , , or . |
| compan(y|ies) | One of the words company or companies.
|
| ex or vi pattern: | |
| <the | Words like theater or the.
|
| the\> | Words like breathe or the.
|
| <the\> | The word the.
|
| sed or grep pattern: | |
| {5,\} | Five or more zeros in a row. |
| [0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\} | US social security number (nnn-nn-nnnn). |
Examples of Searching and Replacing
The following examples show the metacharacters available to sed or ex. (ex commands begin with a colon.) A space is marked by
; a TAB is marked by tab.
| Command | Result |
|---|---|
| s/.*/( & )/ | Redo the entire line, but add parentheses. |
| s/.*/mv & &.old/ | Change a wordlist into mv commands. |
| /^$/d | Delete blank lines. |
| :g/^$/d | ex version of previous. |
/^[![]() tab]*$/d
| Delete blank lines, plus lines containing only spaces or TABs. |
:g/^[![]() tab]*$/d
| ex version of previous. |
s/![]() */ /g
| Turn one or more spaces into one space. |
:%s/![]() */ /g
| ex version of previous. |
| :s/[0-9]/Item &:/ | Turn a number into an item label (on the current line). |
| :s | Repeat the substitution on the first occurrence. |
| :& | Same. |
| :sg | Same, but for all occurrences on the line. |
| :&g | Same. |
| :%&g | Repeat the substitution globally. |
| :.,$s/Fortran/\U&/g | Change word to uppercase, on current line to last line. |
| :%s/.*/\L&/ | Lowercase entire file. |
| :s/\<./\u&/g | Uppercase first letter of each word on current line (useful for titles). |
| :%s/yes/No/g | Globally change a word to No.
|
| :%s/Yes/~/g | Globally change a different word to No (previous replacement).
|
| s/die or do/do or die/ | Transpose words. |
| s/\([Dd]ie\) or \([Dd]o\)/\2 or \1/ | Transpose, using hold buffers to preserve case. |
- DG from Anonymous' UNIX tutorial (SVR4/Solaris)