Helpful Awk commands

Published: Aug 24, 2022 by Darren Foley

GNU awk, refered to as gawk for short, is a text processing tool for tabular delimited data. It is aliased as awk in most linux distributions so I will reference as awk throughout the code samples.

Official GNU Awk Documentation

Calling awk from the command line

This is the most common use case, where you want to filter some output by piping into awk. The below code grabs the process id's for java processes running in the user space.

$ ps -ef | grep "java" | awk '{ print $2 }'

# If you want to kill them for example run

$ ps -ef | grep "java" | awk '{ print $2 }' | xargs kill

Calling awk from a script

awk is primarily a command line tool but can be called from a text file by adding your awk code to a text file and adding a shebang " #!/usr/bin/awk -f " pointing to your awk executable binary. If you haven't added a shebang you must call the text file using "awk -f filename.awk". The .awk extension is by convention but not manditory.

Simply run the script like so (Assuming file has executable permissions)

$ ./test.awk data.txt

or

$ awk -f test.awk data.txt

# if you don't have any input data use a here string

$ ./test.awk <<< cat ""

Special variables

$0 Current line
$1 First field
$2 Second field
$n nth field
$NF Number of Fields
$NR Number of records read so far

Split input by delimiter

awk is line oriented and expects data to be delimited in some way. awk defaults to whitespace as default delimiter. You can specify the delimiter with "-F" flag.

# Getting the second field of a text file with colon delimiter

$ awk -F':' '{ print $2 }' data.txt

Structure of awk programs

awk has three code blocks; a BEGIN block, a main processing block, & an END block. The BEGIN block is executed once before processing, the main block is executed for each line of the input and the END is executed once at the end of processing.

#!/usr/bin/awk -f

BEGIN{

# Initialise variables here

} 
{

# Do processing here

}
END{

# Print results, do something after
}

Both BEGIN/END blocks are executed only once, once before processing in the case of BEGIN and once after in the case of END. awk is a line oriented tool and will loop through each row within the main block {}.

Filter Rows by a pattern

# Prints second & third columns of all lines with "pattern"

$ awk ' /pattern/ { print $2,$3 } ' data.txt

User defined Functions

awk has a large catalog of built in functions for manipulating strings, numbers and arrays but users can also define UDF's for custom logic like so.

#!/usr/bin/awk -f

function my_func(x, y){
  return x*y
}

{
  # Main processing here
  my_func(3,4)
  
}

Conditional checking

If the first field is equal to the string "text" print entire line otherwise print "not found"

$ awk ' { if($1=="text"){ print; } else { print "not found"; }; } ' data.txt

Looping

#!/usr/bin/awk -f
# awk has "c-like" looping syntax

{
  var=4;
  for (i = 1; i <= var; i++) {
      print "do something"
      }

}