gawk scripts consist of patterns and procedures:
pattern {procedure}
Both are optional. If pattern is missing, {procedure} is applied to all records. If {procedure} is missing, the matched record is printed. By default, each line of input is a record, but you can specify a different record separator through the RS variable.
A pattern can be any of the following:
/regular expression/ relational expression pattern-matching expression pattern,pattern BEGIN END
Some rules regarding patterns include:
Expressions can be composed of quoted strings, numbers, operators, functions, defined variables, or any of the predefined variables described later under "gawk System Variables."
Regular expressions use the extended set of metacharacters and are described in Chapter 9.
In addition, ^ and $ can be used to refer to the beginning and end of a field, respectively, rather than the beginning and end of a record.
Relational expressions use the relational operators listed under "Operators" later in this chapter. Comparisons can be either string or numeric. For example, $2 > $1 selects lines for which the second field is greater than the first.
Pattern-matching expressions use the operators ~ (match) and !~ (don't match). See "Operators" later in this chapter.
The BEGIN pattern lets you specify procedures that take place before the first input record is processed. (Generally, you set global variables here.)
The END pattern lets you specify procedures that take place after the last input record is read.
If there are multiple BEGIN or END patterns, their associated actions are taken in the order in which they appear in the script.
pattern,pattern specifies a range of lines. This syntax cannot include BEGIN or END as a pattern.
Except for BEGIN and END, patterns can be combined with the Boolean operators || (OR), && (AND), and ! (NOT).
In addition to other regular-expression operators, GNU gawk supports POSIX character lists, which are useful for matching non-ASCII characters in languages other than English. These lists are recognized only within [ ] ranges. A typical use is [[:lower:]], which in English is the same as [a-z]. See Chapter 9 for a complete list of POSIX character lists.
Procedures consist of one or more commands, functions, or variable assignments, separated by newlines or semicolons and contained within curly braces. Commands fall into four groups:
Variable or array assignments
Printing commands
Built-in functions
Control-flow commands
Print first field of each line (no pattern specified):
{ print $1 }
Print all lines that contain "Linux":
/Linux/
Print first field of lines that contain "Linux":
/Linux/{ print $1 }
Print records containing more than two fields:
NF > 2
Interpret each group of lines up to a blank line as a single input record:
BEGIN { FS = "\n"; RS = "" }
Print fields 2 and 3 in switched order, but only on lines whose first field matches the string "URGENT":
$1 ~ /URGENT/ { print $3, $2 }
Count and print the number of instances of "ERR" found:
/ERR/ { ++x }; END { print x }
Add numbers in second column and print total:
{total += $2 }; END { print "column total is", total}
Print lines that contain fewer than 20 characters:
length( ) < 20
Print each line that begins with "Name:" and that contains exactly seven fields:
NF = = 7 && /^Name:/
Reverse the order of fields:
{ for (i = NF; i >= 1; i--) print $i }
Copyright © 2003 O'Reilly & Associates. All rights reserved.