Quick Reference: awk
This article also covers nawk and gawk (). With the exception of array subscripts, values in [brackets] are optional; don't type the [ or ].
Command-line Syntax
awk can be invoked in two ways:
awk [options] 'script' [var=value] [file(s)] awk [options] -fscriptfile[var=value] [file(s)]
You can specify a script directly on the command line, or you can store a script in a scriptfile and specify it with -f. In most versions, the -f option can be used multiple times. The variable var can be assigned a value on the command line. The value can be a literal, a shell variable ($name), or a command substitution (`cmd`), but the value is available only after a line of input is read (i.e., after the BEGIN statement). awk operates on one or more file(s). If none are specified (or if - is specified), awk reads from the standard input ().
The other recognized options are:
-Fc- Set the field separator to character
c. This is the same as setting the system variable FS. nawk allowscto be a regular expression (). Each record (by default, one input line) is divided into fields by white space (blanks or tabs) or by some other user-definable field separator. Fields are referred to by the variables$1,$2,...$n.$0refers to the entire record. For example, to print the first three (colon-separated) fields on separate lines:
%
awk -F: '{print $1; print $2; print $3}' /etc/passwd
-vvar=value- Assign a
valueto variablevar. This allows assignment before the script begins execution. (Available in nawk only.)
Patterns and Procedures
awk scripts consist of patterns and procedures:
pattern{procedure}
Both are optional. If pattern is missing, {procedure} is applied to all records. If {procedure} is missing, the matched record is written to the standard output.
Patterns
pattern can be any of the following:
/regular expression/relational expressionpattern-matching expressionBEGIN END
- Expressions can be composed of quoted strings, numbers, operators, functions, defined variables, or any of the predefined variables described later under the section "awk System Variables."
- Regular expressions use the extended set of metacharacters as described in article . In addition,
^and$can be used to refer to the beginning and end of a field, respectively, rather than the beginning and end of a record (line). - Relational expressions use the relational operators listed under the section "Operators" later in this article. Comparisons can be either string or numeric. For example,
$2>$1selects records for which the second field is greater than the first. - Pattern-matching expressions use the operators
~(match) and!~(don't match). See the section "Operators" later in this article. - The BEGIN pattern lets you specify procedures that will take place before the first input record is processed. (Generally, you set global variables here.)
- The END pattern lets you specify procedures that will take place after the last input record is read.
Except for BEGIN and END, patterns can be combined with the Boolean operators || (OR), && (AND), and ! (NOT). A range of lines can also be specified using comma-separated patterns:
patternpattern
Procedures
procedure can consist of one or more commands, functions, or variable assignments, separated by newlines or semicolons (;), and contained within curly braces ({}). Commands fall into four groups:
- Variable or array assignments
- Printing commands
- Built-in functions
- Control-flow commands
Simple Pattern-Procedure Examples
- Print first field of each line:
{ print $1 }
- Print all lines that contain
pattern:
/pattern/
- Print first field of lines that contain
pattern:
/pattern/{ print $1 }
- Print records containing more than two fields:
NF > 2
- Interpret input records as a group of lines up to a blank line:
BEGIN { FS = "\n"; RS = "" } {...process records...}
- Print fields 2 and 3 in switched order, but only on lines whose first field matches the string
URGENT:
$1 ~ /URGENT/ { print $3, $2 }
- Count and print the number of
patternfound:
/pattern/ { ++x } END { print x }
- Add numbers in second column and print total:
{total += $2 }; END { print "column total is", total}
- Print lines that contain less than 20 characters:
length($0) < 20
- Print each line that begins with
Name:and that contains exactly seven fields:
NF == 7 && /^Name:/
awk System Variables
nawk supports all awk variables. gawk supports both nawk and awk.
| Version | Variable | Description |
|---|---|---|
| awk | FILENAME | Current filename |
| FS | Field separator (default is whitespace) | |
| NF | Number of fields in current record | |
| NR | Number of the current record | |
| OFMT | Output format for numbers (default is %.6g)
| |
| OFS | Output field separator (default is a blank) | |
| ORS | Output record separator (default is a newline) | |
| RS | Record separator (default is a newline) | |
$0
| Entire input record | |
$n
| nth field in current record; fields are separated by FS
| |
| nawk | ARGC | Number of arguments on command line |
| ARGV | An array containing the command-line arguments | |
| ENVIRON | An associative array of environment variables | |
| FNR | Like NR, but relative to the current file | |
| RSTART | First position in the string matched by match function | |
| RLENGTH | Length of the string matched by match function | |
| SUBSEP | Separator character for array subscripts (default is ) |
Operators
The table below lists the operators, in order of increasing precedence, that are available in awk:
| Symbol | Meaning |
|---|---|
| = += -= *= /= %= ^= | Assignment (^= only in nawk and gawk)
|
| ?: | conditional expression (nawk and gawk) |
| || | Logical OR |
| && | Logical AND |
| ~ !~ | Match regular expression and negation |
| < <= > >= != == | Relational operators |
| (blank) | Concatenation |
| + - | Addition, subtraction |
| * / % | Multiplication, division, and modulus |
| + - ! | Unary plus and minus, and logical negation |
| ^ | Exponentiation (nawk and gawk) |
| ++ -- | Increment and decrement, either prefix or postfix |
| $ | Field reference |
Variables and Array Assignments
Variables can be assigned a value with an equal sign (=). For example:
FS = ","
Expressions using the operators +, -, *, /, and % (modulo) can be assigned to variables.
Arrays can be created with the split function (see below), or they can simply be named in an assignment statement. Array elements can be subscripted with numbers (array[1],...array[n]) or with names. For example, to count the number of occurrences of a pattern, you could use the following script:
/pattern/ {array["pattern"]++ } END { printarray["pattern"] }
Group Listing of awk Commands
awk commands may be classified as follows:
| Arithmetic | String | Control Flow | Input/Output |
|---|---|---|---|
| Functions | Functions | Statements | Processing |
| atan2* | gsub* | break | close* |
| cos* | index | continue | delete* |
| exp | length | do/while* | getline* |
| int | match* | exit | next |
| log | split | for | |
| rand* | sub* | if | printf |
| sin* | substr | return* | sprintf |
| sqrt | tolower* | while | system* |
| srand* | toupper* |
Alphabetical Summary of Commands
The following alphabetical list of statements and functions includes all that are available in awk, nawk, or gawk. Unless otherwise mentioned, the statement or function is found in all versions. New statements and functions introduced with nawk are also found in gawk.
atan2atan2(yx)Returns the arctangent ofy/xin radians. (nawk)break- Exit from a while, for, or do loop.
close-
close(filename-expr)close(command-expr)
In some implementations of awk, you can have only ten files open simultaneously and one pipe; modern versions allow more than one pipe open. Therefore, nawk provides a close statement that allows you to close a file or a pipe. close takes as an argument the same expression that opened the pipe or file. (nawk) continue- Begin next iteration of while, for, or do loop immediately.
coscos(x)Return cosine ofx(in radians). (nawk)deletedeletearray[element] Deleteelementofarray. (nawk)do-
dobodywhile (expr)
Looping statement. Execute statements inbody, then evaluateexpr. Ifexpris true, executebodyagain. More than onecommandmust be put inside braces ({}). (nawk) exitexit[expr] Do not execute remaining instructions and do not read new input. END procedure, if any, will be executed. Theexpr, if any, becomes awk's exit status ().expexp(arg)Return the natural exponent ofarg.forfor ([init-expr];[test-expr];[incr-expr])commandC-language-style looping construct. Typically,init-exprassigns the initial value of a counter variable.test-expris a relational expression that is evaluated each time before executing thecommand. Whentest-expris false, the loop is exited.incr-expris used to increment the counter variable after each pass. A series ofcommands must be put within braces ({}). Example:
for (i = 1; i <= 10; i++) printf "Element %d is %s.\n", i, array[i]
forfor (iteminarray)commandFor eachitemin an associativearray, docommand. More than onecommandmust be put inside braces ({}). Refer to each element of the array asarray[item].getlinegetline[var][<file] orcommand| getline[var] Read next line of input. Original awk does not support the syntax to open multiple input streams. The first form reads input fromfile, and the second form reads the standard output of a UNIXcommand. Both forms read one line at a time, and each time the statement is executed it gets the next line of input. The line of input is assigned to$0, and it is parsed into fields, setting NF, NR, and FNR. Ifvaris specified, the result is assigned tovarand the$0is not changed. Thus, if the result is assigned to a variable, the current line does not change. getline is actually a function and it returns 1 if it reads a record successfully, 0 if end-of-file is encountered, and -1 if for some reason it is otherwise unsuccessful. (nawk)gsubgsub(rs[t])Globally substitutesfor each match of the regular expressionrin the stringt. Return the number of substitutions. Iftis not supplied, defaults to$0. (nawk)ifif (condition)
command[elsecommand]If
conditionis true, docommand(s), otherwise docommand(s)in else clause (if any).conditioncan be an expression that uses any of the relational operators<,<=,==,!=,>=, or>, as well as the pattern-matching operators~or!~(e.g.,if ($1 ~ /[Aa].*[Zz]/)). A series ofcommands must be put within braces ({}).indexindex(strsubstr)Return position of first substringsubstrin stringstror 0 if not found.intint(arg)Return integer value ofarg.lengthlength(arg)Return the length ofarg.loglog(arg)Return the natural logarithm ofarg.matchmatch(sr)Function that matches the pattern, specified by the regular expressionr, in the stringsand returns either the position inswhere the match begins or 0 if no occurrences are found. Sets the values of RSTART and RLENGTH. (nawk)next- Read next input line and start new cycle through pattern/procedures statements.
printprint[args] [destination] Printargson output, followed by a newline.argsis usually one or more fields, but may also be one or more of the predefined variables - or arbitrary expressions. If noargsare given, prints$0(the current input line). Literal strings must be quoted. Fields are printed in the order they are listed. If separated by commas () in the argument list, they are separated in the output by the OFS character. If separated by spaces, they are concatenated in the output.destinationis a UNIX redirection or pipe expression (e.g.,>file) that redirects the default standard output.printfformat[expression(s)] [destination] Formatted print statement. Fields or variables can be formatted according to instructions in theformatargument. The number ofexpressions must correspond to the number specified in the format sections.formatfollows the conventions of the C-language printf statement. Here are a few of the most common formats:%s- A string.
%d- A decimal number.
%nmf- A floating-point number, where
nis the total number of digits andmis the number of digits after the decimal point. %[-]ncnspecifies minimum field length for format typec, while-left justifies value in field; otherwise value is right justified.formatcan also contain embedded escape sequences:n(newline) ort(tab) are the most common.destinationis a UNIX redirection or pipe expression (e.g.,>file) that redirects the default standard output. Example: Using the script:
{printf "The sum on line %s is %d.\n", NR, $1+$2}The following input line:
produces this output, followed by a newline:
The sum on line 1 is 10.
randrand()Generate a random number between 0 and 1. This function returns the same series of numbers each time the script is executed, unless the random number generator is seeded using the srand( ) function. (nawk)returnreturn[expr] Used at end of user-defined functions to exit the function, returning value of expressionexpr, if any. (nawk)sinsin(x)Return sine ofx(in radians). (nawk)splitsplit(stringarray[sep])Splitstringinto elements ofarrayarray[1],... array[n].stringis split at each occurrence of separatorsep. (In nawk, the separator may be a regular expression.) Ifsepis not specified, FS is used. The number of array elements created is returned.sprintfsprintf (format[expression(s)])Return the value ofexpression(s), using the specifiedformat(see printf). Data is formatted but not printed.sqrtsqrt(arg)Return square root ofarg.srandsrand(expr)Useexprto set a new seed for random number generator. Default is time of day. Returns the old seed. (nawk)subsub(rs[t])Substitutesfor first match of the regular expressionrin the stringt. Return 1 if successful; 0 otherwise. Iftis not supplied, defaults to$0. (nawk)substrsubstr(stringm[n])Return substring ofstringbeginning at character positionmand consisting of the nextncharacters. Ifnis omitted, include all characters to the end of string.systemsystem(command)Function that executes the specified UNIXcommandand returns its status (). The status of the command that is executed typically indicates its success (0) or failure (non-zero). The output of the command is not available for processing within the nawk script. Usecommand|getlineto read the output of the command into the script. (nawk)tolowertolower(str)Translate all uppercase characters instrto lowercase and return the new string. (nawk)touppertoupper(str)Translate all lowercase characters instrto uppercase and return the new string. (nawk)while-
while (condition)command
Docommandwhileconditionis true (seeiffor a description of allowable conditions). A series of commands must be put within braces ({}).
- DG from Anonymous' UNIX tutorial (SVR4/Solaris)