Quick Reference: awk

This article also covers nawk and gawk (). With the exception of array subscripts, values in [brackets] are optional; don't type the [ or ].

Command-line Syntax

awk can be invoked in two ways:

awk [options] 'script' [var=value] [file(s)] awk [options] -f scriptfile [var=value] [file(s)]

You can specify a script directly on the command line, or you can store a script in a scriptfile and specify it with -f. In most versions, the -f option can be used multiple times. The variable var can be assigned a value on the command line. The value can be a literal, a shell variable ($name), or a command substitution (`cmd`), but the value is available only after a line of input is read (i.e., after the BEGIN statement). awk operates on one or more file(s). If none are specified (or if - is specified), awk reads from the standard input ().

The other recognized options are:

Patterns and Procedures

awk scripts consist of patterns and procedures:

pattern {procedure}

Both are optional. If pattern is missing, {procedure} is applied to all records. If {procedure} is missing, the matched record is written to the standard output.

Patterns

pattern can be any of the following:

/regular expression/ relational expression pattern-matching expression BEGIN END


Except for BEGIN and END, patterns can be combined with the Boolean operators || (OR), && (AND), and ! (NOT). A range of lines can also be specified using comma-separated patterns:

patternpattern

Procedures

procedure can consist of one or more commands, functions, or variable assignments, separated by newlines or semicolons (;), and contained within curly braces ({}). Commands fall into four groups:

Simple Pattern-Procedure Examples

  1. Print first field of each line:

    { print $1 }
    


  2. Print all lines that contain pattern:

    /pattern/
    


  3. Print first field of lines that contain pattern:

    /pattern/{ print $1 }
    


  4. Print records containing more than two fields:

    NF > 2
    


  5. Interpret input records as a group of lines up to a blank line:

    BEGIN {
     FS = "\n"; RS = ""
    }
    {
     ...process records... }
    


  6. Print fields 2 and 3 in switched order, but only on lines whose first field matches the string URGENT:

    $1 ~ /URGENT/ {
     print $3, $2 }
    


  7. Count and print the number of pattern found:

    /pattern/ {
     ++x
    }
    END {
     print x }
    


  8. Add numbers in second column and print total:

    {total += $2 };
     END {
     print "column total is", total}
    


  9. Print lines that contain less than 20 characters:

    length($0) < 20
    


  10. Print each line that begins with Name: and that contains exactly seven fields:

    NF == 7 && /^Name:/
    


awk System Variables

nawk supports all awk variables. gawk supports both nawk and awk.

Version Variable Description
awk FILENAME Current filename
FS Field separator (default is whitespace)
NF Number of fields in current record
NR Number of the current record
OFMT Output format for numbers (default is %.6g)
OFS Output field separator (default is a blank)
ORS Output record separator (default is a newline)
RS Record separator (default is a newline)
$0 Entire input record
$n nth field in current record; fields are separated by FS
nawk ARGC Number of arguments on command line
ARGV An array containing the command-line arguments
ENVIRON An associative array of environment variables
FNR Like NR, but relative to the current file
RSTART First position in the string matched by match function
RLENGTH Length of the string matched by match function
SUBSEP Separator character for array subscripts (default is )

Operators

The table below lists the operators, in order of increasing precedence, that are available in awk:

Symbol Meaning
= += -= *= /= %= ^= Assignment (^= only in nawk and gawk)
?: conditional expression (nawk and gawk)
|| Logical OR
&& Logical AND
~ !~ Match regular expression and negation
< <= > >= != == Relational operators
(blank) Concatenation
+ - Addition, subtraction
* / % Multiplication, division, and modulus
+ - ! Unary plus and minus, and logical negation
^ Exponentiation (nawk and gawk)
++ -- Increment and decrement, either prefix or postfix
$ Field reference

Variables and Array Assignments

Variables can be assigned a value with an equal sign (=). For example:

FS = ","

Expressions using the operators +, -, *, /, and % (modulo) can be assigned to variables.

Arrays can be created with the split function (see below), or they can simply be named in an assignment statement. Array elements can be subscripted with numbers (array[1],...array[n]) or with names. For example, to count the number of occurrences of a pattern, you could use the following script:

/pattern/ {
 array["pattern"]++
}
END {
 print array["pattern"] }


Group Listing of awk Commands

awk commands may be classified as follows:

Arithmetic String Control Flow Input/Output
Functions Functions Statements Processing
atan2* gsub* break close*
cos* index continue delete*
exp length do/while* getline*
int match* exit next
log split for print
rand* sub* if printf
sin* substr return* sprintf
sqrt tolower* while system*
srand* toupper*
*Not in original awk

Alphabetical Summary of Commands

The following alphabetical list of statements and functions includes all that are available in awk, nawk, or gawk. Unless otherwise mentioned, the statement or function is found in all versions. New statements and functions introduced with nawk are also found in gawk.

- DG from Anonymous' UNIX tutorial (SVR4/Solaris)