comp.lang.awk FAQ

This material of this faq originates from the comp.lang.awk FAQ that you can find there:

What is Awk?

awk is an extraction and reporting language, named after its three original authors:

  • Alfred V. Aho
  • Peter J. Weinberger
  • Brian W. Kernighan

they write:

  Awk is a convenient and expressive programming language that can be
  applied to a wide variety of computing and data-manipulation tasks.

The title of the book uses `AWK', but the contents of the book use `awk' (except at the beginning of sentences, as above). I will attempt to do the same (except perhaps at the beginning of sentences, as above).

Most implementations of awk are interpreters which read your awk source program and parse it and act on it directly.

Some vendors have developed awk compilers which will produce an executable that may be run stand-alone -- thus, the end user does not have access to the source code. There are also various awk->C converters which allow you to achieve the same functionality (by compiling the resulting C code later).

One of the most popular compilers, from Thompson Automation tawk, continues to be the subject of many positive posts in the group comp.lang.awk.

    I don't really want to start a reviews section, but it may be
    appropriate.  I think it's of general interest, and a good thing
    for the FAQ, but I don't want to be given any grief by a negative
    review I didn't write just because I'm distributing it.

    if you have a review you'd like me to put a pointer to, please
    inform me -- I already have some pointers of this form listed.

comp.lang.awk is not particularly about sed; for sed discussion. For sed related issues, there is a newsgroup alt.comp.lang.sed. See the sed FAQ (and other documents) for answers to common questions and group recommendations:

this all seems unrelated to AWK Engineering AG at

(This text was originally imported from the comp.lang.awk faq)

Implementation Timeline

  1. 1977-1985: awk, now also known as 'old awk' or (confusingly) 'awk': the original version of the language, lacking many of the features that make it fun to play with now.
  2. 1985-1996: awk, often called 'new awk', 'nawk' or 'BWK awk': the second major incarnation of the language, reflecting the language as it is currently known and loved.
  3. gawk came to be sometime around 1986 according to the gawk-manual; in any case, it is still actively maintained now. The newest released version is 3.1.6. The GNU Project's Savannah site hosts CVS repositories for both the stable version (what will come after 3.1.6) and the development version (3.2.0 or 4.x; the gawk maintainer hasn't decided yet).
  4. 1991: Mike Brennan announces mawk on Usenet.
  5. 1996: BWK awk was released under an open license. Huzzah! This version is also still maintained and is available from Brian Kernighan's home pages at Bell Labs 1 and Princeton 2.
  6. Sometime before the present: xgawk, jawk, awkcc, Kernighan's nameless awk-to-C++ compiler, awka, tawk and busybox awk came to be.

It's a bit embarrassing to note that the exact origins of each are a bit hazy. This whole section requires further work.

Awk systems published under closed licenses are uninteresting.

(there may or may not be a WartAndWishList detailing the annoying bits of awk, and those bits that are annoying because they are missing...)

Edit this answer

How can I access shell or environment variables in an awk script?


The examples using quoting are intended for use with any standard (sh-compatible-quoting) Unix shell. As with all complex quoting, all these examples become much easier to work with (or under DOS and MS-Windows, less impossible) when put in a file and invoked with `awk -f filename.awk' instead.

Non-sh-compatible shells will require different quoting. If you're not even using Unix (or a ported Unix shell), just ignore the whole section on quoting.

Edit this answer

Environment variables in general

Answer 1:

On Unix, use "alternate quoting", e.g.

        awk -F: '$1 ~ /'"$USER"'/ {print $5}' /etc/passwd

Any standard Unix shell will send the underlined part as one long argument (with embedded spaces) to awk, for instance:

        $1 ~ /bwk/ {print $5}

Note that there may not be any spaces between the quoted parts. Otherwise, you wouldn't end up a single, long script argument, because Unix shells break arguments on spaces (unless they are `escaped' with `\', or in '' or "", as the above example shows).

This approach should be avoided in general, unless it is the only one supported by your version of awk (which, in that case, should be upgraded anyway, and not just for this reason). The problem is that it cannot be trusted to work in general, and the outcomes are highly dependent on the actual content of the shell variables you are using. Some examples follow (taken from discussions on comp.lang.awk):

$ var="#"
$ awk 'BEGIN{ print '"$var"' }'
awk: cmd. line:1: BEGIN{ print # }
awk: cmd. line:1:              ^ syntax error

The above can be "corrected" by using double quotes in the awk program:

$ var="#"
$ awk 'BEGIN{ print "'"$var"'" }'

However, there are cases where even that is not enough:

$ var="hello
$ awk 'BEGIN{ print "'"$var"'" }'
awk: BEGIN{ print "hello
awk:              ^ unterminated string

That however works using -v:

$ var="hello
$ awk -v var="$var" 'BEGIN{ print var }'

See next answer for a description of -v:

Answer 2:

RTFM to see if and how your awk supports variable definitions on the command line, e.g.,

  awk -F: -v name="$USER" '$1 ~ name {print $5}' /etc/passwd

Answer 3

RTFM if your awk can access enviroment vars. Then perhaps

  awk -F: '$1 ~ ENVIRON["USER"] {print $5}' /etc/passwd

Always remember for your /bin/sh scripts that it's easy to put things into the environment for a single command run:

        name=felix age=56 awk '... ENVIRON["name"] .....'

this also works with ksh and some other shells.

The first approach is extremely portable, but doesn't work with awk "-f" script files. In that case, it's better to use a shell script and stretch a long awk command argument in '...' across multiple lines if need be.

Also note: /bin/csh requires a \ before an embedded newline, /bin/sh not.

See [1] for a very complete discussion of passing shell variables values to awk programs.

Edit this answer

Unix Shell Quoting

Quoting can be such a headache for the novice, in shell programming, and especially in awk.

Art Povelones posted a long tutorial on shell quoting on 1999/09/30 which is probably too much detail to repeat with the FAQ; if you could use it, search via <>.

Tim Maher offered his <>.

This approach is probably the best, and easiest to understand and maintain, for most purposes: (the '@@' is quoted to ensure the shell will copy verbatim, not interpreting environment variable substitutions etc.)

    cat <<'@@' > /tmp/never$$.awk
    { print "Never say can't." }
    awk -f /tmp/never$$.awk; rm /tmp/never$$.awk

If you enjoy testing your shell's quoting behavior frequently, you could try these:

   (see below for a verbose explanation of the first one, with 7 quotes)

    awk 'BEGIN { q="'"'"'";print "Never say can"q"t."; exit }'
    nawk -v q="'" 'BEGIN { print "Never say can"q"t."; exit }'
    awk 'BEGIN { q=sprintf("%c",39); print "Never say can"q"t."; exit }'
    awk 'BEGIN { q=sprintf("%c",39); print "Never say \"can"q"t.\""; exit }'

However, you would also have to know why you could not use this:

    awk 'BEGIN { q="\'"; print "Never say \"can"q"t.\""; exit }'

explanation of the 7-quote example:

note that it is quoted three different ways:

    awk 'BEGIN { q="'
                        '";print "Never say can"q"t."; exit }'

and that argument comes out as the single string (with embedded spaces)

    BEGIN { q="'";print "Never say can"q"t."; exit }

which is the same as

    BEGIN { q="'"; print "Never say can" q "t."; exit }
                          ^^^^^^^^^^^^^  ^  ^^
                          |           |  |  ||
                          |           |  |  ||
                          vvvvvvvvvvvvv  |  ||
                          Never say can  v  ||
                                         '  vv

which, quite possibly with too much effort to be worth it, gets you

                          Never say can't.

Edit this answer

ENVIRON[] and "env"|getline

Modern versions of new awk (gawk, mawk, Bell Labs awk, any POSIX awk) all provide an array named ENVIRON. The array is indexed by environment variable name; the value is that variable's value. For instance, ENVIRON["HOME"] might be "/home/chris". To print out all the names and values, use a simple loop:

        for (i in ENVIRON)
                printf("ENVIRON['%s'] = '%s'\n", i, ENVIRON[i])

What if my awk doesn't have ENVIRON[]?

Short answer, get a better awk. There are many freely available versions.

Longer answer, on Unix you can use a pipe from the `env' or `printenv' commands, but this is less pretty, and may be a problem if the values contain newlines:

        # test this on your system before you depend on it!
        while ( ("env" | getline line) >0 )
                print "var [" varname "]='" varvalue "'"

Edit this answer

exporting environment variables back to the parent process

How can I put values into the environment of the program that called my awk program?

Short answer, you can't. Unix ain't Plan 9, and you can't tweak the parent's address space.

(DOS isn't even Unix, so it lets any program overwrite any memory location, including the parent's environment space. But the details are [obviously] going to be fairly icky. Avoid.)

Longer answer, write the results in a form the shell can parse to a temporary file, and have the shell "source" the file after running the awk program:

        awk 'BEGIN { printf("NEWVAR='%s'\n", somevalue) }' > /tmp/awk.$$
        . /tmp/awk.$$        # sh/ksh/bash/pdksh/zsh etc
        rm /tmp/awk.$$

With many shells, you can use `eval', but this is also cumbersome:

        eval `awk 'BEGIN { print "NEWVAR=" somevalue }'`

Csh syntax and more robust use of quotation marks are left as exercises for the reader.

Edit this answer

Why would anyone still use awk instead of perl?

A valid question, since awk is a subset of perl (functionally, not necessarily syntactically); also, the authors of perl have usually known awk (and sed, and C, and a host of other Unix tools) very well, and still decided to move on.

There are some things that perl has built-in support for that almost no version of awk can do without great difficulty (if at all); if you need to do these things, there may be no choice to make. For instance, no reasonable person would try to write a web server in awk instead of using perl or even C, if the actual socket programming has to be written in traditional awk. However, gawk 3.1.0's /inet and ftwalk's built-in networking primitives may remove this particular limitation.

However, there are some things in awk's favor compared to perl:

  • awk is simpler (especially important if deciding which to learn first)
  • awk syntax is far more regular (another advantage for the beginner, even without considering syntax-highlighting editors)
  • you may already know awk well enough for the task at hand
  • you may have only awk installed
  • awk can be smaller, thus much quicker to execute for small programs
  • awk variables don't have `$' in front of them :-)
  • clear perl code is better than unclear awk code; but NOTHING comes close to unclear perl code

Tom Christiansen wrote in Message-ID: <>

  > Awk is a venerable, powerful, elegant, and simple tool that everyone
  > should know.  Perl is a superset and child of awk, but has much more
  > power that comes at expense of sacrificing some of that simplicity.

Edit this answer

How do I report a bug in gawk?

This is described in great detail in the gawk documentation. In brief:

  1. Make sure what you've discovered is really a bug by checking the documentation and, if possible, comparing with nawk and mawk.
  2. Cut down the program and data to as small as possible a test case that will illustrate the bug.
  3. Optionally post to comp.lang.awk; this allows others to confirm or deny the behavior, and its incorrectness (or lack thereof).
  4. Send mail to <>. This automatically sends a copy to Arnold Robbins. Do not JUST post in comp.lang.awk; Arnold's readership there is sporadic, and of course any Usenet article can be missed, killed, or dropped.

Edit this answer

Is there an easy way to determine if you have oawk or nawk?

you can determine if you have oawk or nawk using the following in a BEGIN rule will do the trick.

        if (ARGC == 0)
                # old awk
                # new awk

Edit this answer

How does awk deal with multiple files?

Version warning

Some of these techniques will require non-ancient versions of awk.

How can awk test for the existence of a file?

The most portable way to test for the existence of a file is to simply try and read from the file.

        function exists(file,        dummy, ret)
                if ( (getline dummy < file) >=0 )
                        # file exists (possibly empty) and can be read
                        ret = 1;
                return ret;

[ I've read reports that earlier versions of mawk would write to stderr as well as getline returning <0 -- is this still true? ]

On Unix, you can probably use the `test' utility

        if (system("test -r " file) == 0)
            # file is readable
            # file is not readable

Edit this answer

How can I get awk to read multiple files?

awk automatically reads multiple files (under Unix at least) -- use something like:

    awk '/^#include/ {print $2}' *.c *.h

Edit this answer

How can I tell from which file my input is coming?

the file name is stored in the built-in variable FILENAME:

    awk '/^#include/ {print FILENAME,$2}' *.c *.h

Edit this answer

How can I get awk to open multiple files (selected at runtime)?

You can open files dynamically using `getline', `close', and `print EXPR > FILENAME', like:

    # assumes input file has at least 1 line, output file writeable
    function double(infilename,outfilename,    aline)
      while ( (getline aline < infilename) >0 )
        print(aline aline) > outfilename;

Edit this answer

How can I treat the first file specially?

you can tell if awk is parsing the first file given on the command line using FILENAME, thusly:

    BEGIN { rulesfile="" }
    rulesfile == "" { rulesfile = FILENAME; }
    FILENAME == rulesfile { build_rule($0); }
    FILENAME != rulesfile { apply_rule($0); }


Suppose you have a text-line "database" and you want to make some batch changes to it, by replacing some old lines with new lines.

    BEGIN { rulesfile="" }
    rulesfile == "" { rulesfile = FILENAME; }
    rulesfile == FILENAME { replace[$1] = $0; }
    rulesfile != FILENAME \
            if ($1 in replace) 
                    print replace[$1];

Another way, using ARGV:

    (FILENAME == ARGV[1]) { replace[$1] = $0; next }
    ($1 in replace) { print replace[$1]; next }
    { print }

Edit this answer

How can I explicitly pass in a filename to treat specially?

You can use `-v rulesfile=filename' to process a file differently, like you would any other variable, and then use a `getline' loop (and `close') in your BEGIN statement.

    BEGIN \
      if (rulesfile=="")
        print "must use -v rulesfile=filename";
      while ( (getline < rulesfile) >0 )

      if ($1 in replace)
        print replace[$1];

Edit this answer

How many elements were created by split()?

How many elements were created by split()?

When I do a split on a field, e.g.,


How can i find out how many elements x has (I mean other than testing for null string or doing a `for (n in x)' test)?

split() is a function; use its return value:

        n = split($1, x, "string")

Edit this answer

How can I split a string into characters?

In portable POSIX awk, the only way to do this is to use substr to pull out each character, one by one. This is painful. However, gawk, mawk, and the newest version of the Bell Labs awk all allow you to set FS = "" and use "" as the third argument of split.

So, split("chars",anarray,"") results in the array anarray containing 5 elements -- "c", "h", "a", "r", "s".

If you don't have any ^As in your string, you could try:

        gsub(".", "&\001", string)
        n=split(string, anarray, "\001")
        for (i=1;i<=n;i++)
            print "character " i "is '" anarray[i] "'";

Edit this answer

Why does SunOS?/Solaris awk behave oddly?

I want to use the tolower() function with SunOS nawk, but all I get is

        nawk: calling undefined function tolower

The SunOS nawk is from a time before awk acquired the tolower() and toupper() functions. Either use one of the freely available awks, or or use /usr/xpg4/bin/awk (if you have it), or write your own function to do it using index, substr, and gsub.

An example of such a function is in O'Reilly's _Sed & Awk_.

Patrick TJ McPhee writes:

> SunOS includes three versions of awk. /usr/bin/awk is the old
> (pre-1989) version. /usr/bin/nawk is the new awk which appeared
> in 1989, and /usr/xpg4/bin/awk is supposed to conform to the single
> unix specification. No one knows why Sun continues to ship old awk.

Edit this answer

How do I have dynamic-width printf strings, like C?

With modern awks, you can just do it like you would in C (though the justification is less clear; C doesn't have the trivial in-line string concatenation that awk does), like so:


        for (i in arr)
          if (maxlen<length(arr[i]))

        for (i in arr)
          printf("%-*s %s\n",maxlen,arr[i],i)

With old awks, just do it like you would do if you didn't know about %* (this would be much more painful to do in C), like so:


        for (i in arr)
          if (maxlen<length(arr[i]))

        printfstring="%-" maxlen "s %s\n";
        for (i in arr)

Edit this answer

Why doesn't "
$" behave like /
$/ ? Why don't parentheses match?

Because "\\$" is a string and /\\$/ is not; in strings, some of the escape characters get eaten up (like \" to escape a double-quote within the string).

/\\$/ => regular expression:  literal backslash at end-of-expression
"\\$" => string: \$ => regular expression:  literal dollar sign

To get behavior like the first case in a string, use "\\\\$" .

There are other, less obvious characters which need the same attention; under-quoting or over-quoting should be avoided:

  • Parentheses are special for alternation:
/\(test\)/ => 6 characters `(test)'
"\(test\)" => /(test)/ => 4 characters `test' (with unused grouping)

An example of trying to match some diagonal compass directions:

/(N|S)(E|W)/ => `NE' or `NW' or `SE' or `SW' (correct)
"(N|S)(E|W)" => /(N|S)(E|W)/ (correct)
"\(N|S\)\(E|W\)" => /(N|S)(E|W)/ (correct) (NOTE:  all \ had no effect)
"\(N\|S\)\(E\|W\)" => /(N|S)(E|W)/ (correct) (NOTE:  all \ had no effect)
  • Expressions that look similar but behave totally differently:
/\(N|S\)\(E|W\)/ => `(N' or `S)(E' or `W)'
/\(N\|S\)\(E\|W\)/ => `(N|S)(E|W)' only

There is also confusion regarding different forms of special characters; POSIX requires that `\052' be treated as any other `*', even though it is written with 4 bytes instead of 1. In compatibility mode, gawk will treat it as though it were escaped , namely `\*'.

Edit this answer

What is awk's exit code?

Normally, the `exit' command exits with a value of zero.

You can supply an optional numeric value to the `exit' command to make it exit with a value:

    if (whatever)
        exit 12;

If you have an END block, control first transfers there. Within the END block, an `exit' command exits immediately; if you had previously supplied a value, that value is used. But, if you give a new value to `exit' within the END block, the new value is used. This is documented in the GNU Awk User's Guide (gawk.texi).

If you have an END block you want to be able to skip sometimes, you may have to do something like this:


# normal rules processing...
  if (fatal)

  if (exitcode!=0)

Edit this answer

How can I get awk to be case-insensitive?

Use tolower()

  • portable
  • must be explicitly used for each comparison

Instead of:

  if (avar=="a" || avar=="A") { ... }


  if (tolower(avar)=="a") { ... }

Or at the beginning of your code, add a line like:

  { for (i=0;i<=NF;i++) $i=tolower($i) }
  { $0=tolower($0); }   # modern awks will rebuild $1..$NF also


  • gawk only
  • used for all comparisons, regex comparisons, index() function
  • not used for array indexing

Edit this answer

How can I force a numeric/non-numeric comparison?

These are the canonical, work-in-all-versions snippets. There are many others, most longer, some shorter (but possibly less portable).

To compare two variables as numbers ONLY, use

 if (0+var1 == 0+var2)

To compare two variables as non-numeric strings ONLY, use

 if ("" var1 == "" var2)

Edit this answer

Why does { FS=":"; print $1 } not split the first record?

Basically, you should set FS before it may be called upon to split $0 into fields. Once awk encounters a `{', it is probably too late.

Some awk implementations set the fields at the beginning of the block, and don't re-parse just because you changed FS. To get the desired behavior, you must set FS _before_ reading in a line.


  BEGIN { FS=":" }
  { print $1 }


  awk -F: '{ print $1 }'

if you run code like this

  { FS=":"; print $1 }

on this data:

  first:second:third but not last:fourth
  First:Second:Third But Not Last:Fourth

you may get either:

  this:       or this:
  ----        -------
  first       first:second:third
  First       First

perhaps more surprisingly, code like

  { FS=":"; }
  { print $1; }

will also behave in the same way.

Edit this answer

Why does awk 'BEGIN { print 6 " " -22 }' lose the space?

You'd expect `6 -22', but you get `6-22'. It's because the `" " -22' is grouped first, as a substraction instead of a concatenation, resulting in the numeric value `-22'; then it is concatenated with `6', giving the string `6-22'. Gentle application of parentheses will avoid this.

Edit this answer


original faq:

These people have contributed to the well-being of the FAQ:

  arnold [at] (Arnold D. Robbins)
  walkerj [at] (James G. Walker)
  jland [at] (Jim Land)
  yuli.barcohen [at] (Yuli Barcohen)
  johnd [at] (John DeHaven)
  amnonc [at] (Amnon Cohen)
  saguyami [at] (Shay)
  hankedr [at] (Darrel Hankerson)
  mark [at] (Mark Katz)
  brennan [at] (Michael D. Brennan)
  neitzel [at] (Martin Neitzel)
  pjf [at] (Peter Jaspers-Fayer)
  dmckeon [at] (Denis McKeon)
  neil_mahoney [at] (Neil Mahoney)
  dzubera [at] CS.ColoState.EDU (Zube)
  allen [at] (John L. Allen)
  jerabek [at] (Martin Jerabek)
  thull [at] (Tom Hull)
  bmarcum [at] (Bill Marcum)
  thobe [at] (Glenn Thobe) 
  boffi [at] (giacomo boffi)
  hastinga [at] (Austin Hastings)
  konrad [at] (Konrad Hambrick)
  jmccann [at] (James McCann)
  eia018 [at] (Dr Andrew Wilson)
  Alex.Schoenmakers [at]
  rwab1 [at] (Ralph Becket)
  jesusmc [at] (Jesus M. Castagnetto)
  monty [at] (Jim Monty)
  epement [at] (Eric Pement)
  gavin [at] (Gavin Wraith)
  pierre [at] (Gianni Rondinini)
  lothar [at] (Lothar M. Schmitt)
  morrisl [at] (Larry D. Morris)
  Juergen.Kahrs [at]
  kahrs [at] (Juergen Kahrs)
  tim [at] (Tim Maher/CONSULTIX)
  phil [at] (Philip Brown)
  andrew_sumner [at] (Andrew Sumner)
  jblaine [at] (Jeff Blaine)
  dmeier.esperanto [at] (Detlef Meier)
  heiner.steven [at] (Heiner Steven)
  joe [at]
  hstein [at] (Harry Stein)
  ptjm [at] (Patrick TJ McPhee)
  db21 [at] (David Beyerl)
  art [at] (Art Povelones)
  jari.aalto [at] (Jari Aalto)
  jlaiho [at] (Juha Laiho)
  walter [at] (Walter Briscoe)
  SimonN [at] (Nicole Simon)
  peter.tillier [at] (Peter S Tillier)
  churchyh [at] (Henry Churchyard)
  Ferran.Jorba [at] (Ferran Jorba)
  Kalle.Tuulos [at] (Kalle Tuulos)
  rms [at] (Rafal Sulejman)
  pjfarley [at] (Peter J. Farley III)
  neel [at]
  afu [at]
  pez68 [at] (Peter Stromberg)
  edgar.j.ramirez [at] (Edgar J. Ramirez)
  pholzleitner [at] (Peter HOLZLEITNER)
  bps03z [at] (Peter Saffrey)
  jidanni [at] (Dan Jacobson)
  lehalle [at] (Charles-Albert Lehalle)
  robin.moffatt [at] (Robin Moffatt)
  markus [at] (Markus B. Biewer)
  vincent [at] (Vincent de Lau)
  vjpnreddy [at] (Jaya Reddy)
  David.Billinghurst [at] (David Billinghurst)
  j-korsv [at] (Jon-Egil Korsvold)