A Brief Introduction to Regular Expressions

Bash Unix Shell Scripting:
Chapter 18. Regular Expressions

A Brief Introduction to Regular Expressions

An expression is a string of characters. Those characters having an interpretation above and beyond their literal meaning are called metacharacters. A quote symbol, for example, may denote speech by a person, ditto, or a meta-meaning for the symbols that follow. Regular Expressions are sets of characters and/or metacharacters that match (or specify) patterns.

A Regular Expression contains one or more of the following:

The main uses for Regular Expressions (REs) are text searches and string manipulation. An RE matches a single character or a set of characters -- a string or a part of a string.

The only way to be certain that a particular RE works is to test it.

TEST FILE: tstfile                          # No match.
                                            # No match.
Run   grep "1133*"  on this file.           # Match.
                                            # No match.
                                            # No match.
This line contains the number 113.          # Match.
This line contains the number 13.           # No match.
This line contains the number 133.          # No match.
This line contains the number 1133.         # Match.
This line contains the number 113312.       # Match.
This line contains the number 1112.         # No match.
This line contains the number 113312312.    # Match.
This line contains no numbers at all.       # No match.
bash$ grep "1133*" tstfile
Run   grep "1133*"  on this file.           # Match.
 This line contains the number 113.          # Match.
 This line contains the number 1133.         # Match.
 This line contains the number 113312.       # Match.
 This line contains the number 113312312.    # Match.
       
Note

Some versions of sed, ed, and ex support escaped versions of the extended Regular Expressions described above, as do the GNU utilities.

, , and , used as filters in scripts, take REs as arguments when "sifting" or transforming files or I/O streams. See and for illustrations of this.

The standard reference on this complex topic is Friedl's Mastering Regular Expressions. Sed & Awk, by Dougherty and Robbins, also gives a very lucid treatment of REs. See the for more information on these books.

Notes

A meta-meaning is the meaning of a term or expression on a higher level of abstraction. For example, the literal meaning of regular expression is an ordinary expression that conforms to accepted usage. The meta-meaning is drastically different, as discussed at length in this chapter.

Since , , and process single lines, there will usually not be a newline to match. In those cases where there is a newline in a multiple line expression, the dot will match the newline.

#!/bin/bash
sed -e 'N;s/.*/[&]/' << EOF   # Here Document
line1
line2
EOF
# OUTPUT:
# [line1
# line2]
echo
awk '{ $0=$1 "\n" $2; if (/line.1/) {print}}' << EOF
line 1
line 2
EOF
# OUTPUT:
# line
# 1
# Thanks, S.C.
exit 0