AwkChannelWiki: comp.lang.awk FAQJapanese

This material of this faq originates from the comp.lang.awk FAQ that you can find there:

http://www.faqs.org/faqs/computer-lang/awk/faq/
Content

What is Awk?

awk is a programming language, named after its three original authors:

Alfred V. Aho
Brian W. Kernighan
Peter J. Weinberger

they write:

  Awk is a convenient and expressive programming language that can be
  applied to a wide variety of computing and data-manipulation tasks.

The title of the book uses `AWK', but the contents of the book use `awk' (except at the beginning of sentences, as above). I will attempt to do the same (except perhaps at the beginning of sentences, as above).

Most implementations of awk are interpreters which read your awk source program and parse it and act on it directly.

Some vendors have developed awk compilers which will produce an executable that may be run stand-alone -- thus, the end user does not have access to the source code. There are also various awk->C converters which allow you to achieve the same functionality (by compiling the resulting C code later).

One of the most popular compilers, from Thompson Automation, continues to be the subject of many positive posts in the group.

    I don't really want to start a reviews section, but it may be
    appropriate.  I think it's of general interest, and a good thing
    for the FAQ, but I don't want to be given any grief by a negative
    review I didn't write just because I'm distributing it.

    if you have a review you'd like me to put a pointer to, please
    inform me -- I already have some pointers of this form listed.

comp.lang.awk is not particularly about sed; for sed discussion, see the sed FAQ (and other documents) for answers to common questions and group recommendations:

this all seems unrelated to AWK Engineering AG at http://www.awk.ch

Edit this answer

How can I access shell or environment variables in an awk script?

Shells

The examples using quoting are intended for use with any standard (sh-compatible-quoting) Unix shell. As with all complex quoting, all these examples become much easier to work with (or under DOS and MS-Windows, less impossible) when put in a file and invoked with `awk -f filename.awk' instead.

Non-sh-compatible shells will require different quoting. If you're not even using Unix (or a ported Unix shell), just ignore the whole section on quoting.

Edit this answer

Environment variables in general

環境変数を使う

解答 1:

Unix では「もうひとつのクォート」を使います。例えば、以下のようなものです。

        awk -F: '$1 ~ /'"$USER"'/ {print $5}' /etc/passwd
                ^^^^^^^^*******^^^^^^^^^^^^^^

例えば、標準的な Unix のシェルでは上のアンダーラインの部分は (スペースも込みで) awk にひとつの長い引数として送られます。

        $1 ~ /bwk/ {print $5}

クォートされた部分の間のスペースはないかもしれないことに注意してください。一方、シングルクォートで括らない場合には、Unix のシェルは引数をスペースで分割してしまうため、(上の例のように `\' や '' で括ったり "" で括ったりしない限りは) 長いスクリプトの引数となります。

使っている awk のバージョン (この場合にはアップグレードするべきですが、このためだけというわけではありません) によってサポートされている場合以外では、この方法は一般的に避けた方が良いです。一般的に信頼されていないという問題であり、その結果は使おうとしている実際のシェルの変数の内容に大きく左右されてしまいます。 (comp.lang.awk で議論されている) いくつかの例を上げておきます。

$ var="#"
$ awk 'BEGIN{ print '"$var"' }'
awk: cmd. line:1: BEGIN{ print # }
awk: cmd. line:1:              ^ syntax error

上の例は awk プログラムの中でダブルクォートとして使用しているものとして「訂正」されてしまうかもしれません。

$ var="#"
$ awk 'BEGIN{ print "'"$var"'" }'
#

しかしながら、これで十分というわけではありません。

$ var="hello
world"
$ awk 'BEGIN{ print "'"$var"'" }'
awk: BEGIN{ print "hello
awk:              ^ unterminated string

そこで -v を使うと動作させることができます。

$ var="hello
world"
$ awk -v var="$var" 'BEGIN{ print var }'
hello
world

次に -v を使った解答を見ていきましょう。

解答 2:

自分の awk がコマンドラインでの変数定義をサポートしているかどうかを知るにはマニュアルを読んでください。例えば、以下のようなものです。

  awk -F: -v name="$USER" '$1 ~ name {print $5}' /etc/passwd

解答 3

自分の awk が環境変数にアクセスできるかどうかはマニュアルを読んでください。多分、以下のようにすれば分かるでしょう。

  awk -F: '$1 ~ ENVIRON["USER"] {print $5}' /etc/passwd

ひとつのコマンドで環境変数を簡単に /bin/sh スクリプトで扱えることを覚えておくと便利でしょう。

        name=felix age=56 awk '... ENVIRON["name"] .....'

これは ksh でも他のいくつかのシェルでも動作します。

最初の方法は可搬性がありますが、awk で "-f" でスクリプトファイルを指定した場合には動作しません。こうした場合にはシェルスクリプトを使い、必要であれば長い '...' という部分の awk コマンド引数を複数行に分けると良いでしょう。

/bin/csh は行の折り返しに \ が必要ですが、/bin/sh では不要ということも注意しましょう。

awk でシェルの変数の値を使うことを非常に深く議論したものは [1] を参照してください。

Edit this answer

Unix Shell Quoting

シェルプログラミングの引用符、特に awk での引用符は初心者にとっては頭痛の種です。

Art Povelones は 1999 年 9 月 30 日にシェルでの引用符について長編のチュートリアルを投稿してくれました。これは非常に詳細過ぎるため FAQ で繰り返し使われていないのでしょう。もし、これを使ってみるのであれば、<http://groups.google.com/> から検索してみてください。

Tim Maher は <http://www.consultix-inc.com/quoting.txt> を提案してくれました。

以下の考え方がほとんどの目的を達成する上で多分ベストで、理解しやすく、そしてメンテナンスしやすいものでしょう。('@@' はシェルが文字をそのままコピーすることを確実にするために引用符で囲まれていて、環境変数の置き換えといったようなことをを解釈させることはありません。)

    cat <<'@@' > /tmp/never$$.awk
    { print "Never say can't." }
    @@
    awk -f /tmp/never$$.awk; rm /tmp/never$$.awk

もし、シェルの引用符による挙動をテストすることが多いのであれば、以下のようにしてみてください。

   (最初のものの冗長なものとして以下の 7 つの引用符の使い方を用いたものがあります)
   (上記リンクの 7 つの各項目を参照)

    awk 'BEGIN { q="'"'"'";print "Never say can"q"t."; exit }'
    nawk -v q="'" 'BEGIN { print "Never say can"q"t."; exit }'
    awk 'BEGIN { q=sprintf("%c",39); print "Never say can"q"t."; exit }'
    awk 'BEGIN { q=sprintf("%c",39); print "Never say \"can"q"t.\""; exit }'

しかし、以下のものが使えないということを知っておくべきです。

    awk 'BEGIN { q="\'"; print "Never say \"can"q"t.\""; exit }'

これは 7 項目のクォート方法の例の説明でも述べられていますが、3 つの異なるものでクォートされていることに注意してください。

    awk 'BEGIN { q="'
                     "'"
                        '";print "Never say can"q"t."; exit }'

次に、これらの引数が (スペースも含めて) ひとつの文字列として出現しています。

    BEGIN { q="'";print "Never say can"q"t."; exit }

これは以下のものと同じです。

    BEGIN { q="'"; print "Never say can" q "t."; exit }
                          ^^^^^^^^^^^^^  ^  ^^
                          |           |  |  ||
                          |           |  |  ||
                          vvvvvvvvvvvvv  |  ||
                          Never say can  v  ||
                                         '  vv
                                            t.

努力を惜しまないのであれば可能です。

                          Never say can't. (できないなんて言わないで)

Edit this answer

ENVIRON[] and "env"|getline

Modern versions of new awk (gawk, mawk, Bell Labs awk, any POSIX awk) all provide an array named ENVIRON. The array is indexed by environment variable name; the value is that variable's value. For instance, ENVIRON["HOME"] might be "/home/chris". To print out all the names and values, use a simple loop:

        for (i in ENVIRON)
                printf("ENVIRON['%s'] = '%s'\n", i, ENVIRON[i])

What if my awk doesn't have ENVIRON[]?

Short answer, get a better awk. There are many freely available versions.

Longer answer, on Unix you can use a pipe from the `env' or `printenv' commands, but this is less pretty, and may be a problem if the values contain newlines:

        # test this on your system before you depend on it!
        while ( ("env" | getline line) >0 )
        {
                varname=line
                varvalue=line
                sub(/=.*$/,"",varname)
                sub(/^[^=]*=/,"",varvalue)
                print "var [" varname "]='" varvalue "'"
        }

Edit this answer

exporting environment variables back to the parent process

How can I put values into the environment of the program that called my awk program?

Short answer, you can't. Unix ain't Plan 9, and you can't tweak the parent's address space.

(DOS isn't even Unix, so it lets any program overwrite any memory location, including the parent's environment space. But the details are [obviously] going to be fairly icky. Avoid.)

Longer answer, write the results in a form the shell can parse to a temporary file, and have the shell "source" the file after running the awk program:

        awk 'BEGIN { printf("NEWVAR='%s'\n", somevalue) }' > /tmp/awk.$$
        . /tmp/awk.$$        # sh/ksh/bash/pdksh/zsh etc
        rm /tmp/awk.$$

With many shells, you can use `eval', but this is also cumbersome:

        eval `awk 'BEGIN { print "NEWVAR=" somevalue }'`

Csh syntax and more robust use of quotation marks are left as exercises for the reader.

Edit this answer

Why would anyone still use awk instead of perl?

A valid question, since awk is a subset of perl (functionally, not necessarily syntactically); also, the authors of perl have usually known awk (and sed, and C, and a host of other Unix tools) very well, and still decided to move on.

There are some things that perl has built-in support for that almost no version of awk can do without great difficulty (if at all); if you need to do these things, there may be no choice to make. for instance, no reasonable person would try to write a web server in awk instead of using perl or even C, if the actual socket programming has to be written in traditional awk. However, gawk 3.1.0's /inet and ftwalk's built-in networking primitives may remove this particular limitation.

However, there are some things in awk's favor compared to perl:

awk is simpler (especially important if deciding which to learn first)
awk syntax is far more regular (another advantage for the beginner, even without considering syntax-highlighting editors)
you may already know awk well enough for the task at hand
you may have only awk installed
awk can be smaller, thus much quicker to execute for small programs
awk variables don't have `$' in front of them :-)
clear perl code is better than unclear awk code; but NOTHING comes close to unclear perl code

Tom Christiansen wrote in Message-ID: <3766d75e@cs.colorado.edu>

  > Awk is a venerable, powerful, elegant, and simple tool that everyone
  > should know.  Perl is a superset and child of awk, but has much more
  > power that comes at expense of sacrificing some of that simplicity.

Edit this answer

How do I report a bug in gawk?

This is described in great detail in the gawk documentation. In brief:

Make sure what you've discovered is really a bug by checking the documentation and, if possible, comparing with nawk and mawk.
Cut down the program and data to as small as possible a test case that will illustrate the bug.
Optionally post to comp.lang.awk; this allows others to confirm or deny the behavior, and its incorrectness (or lack thereof).
Send mail to <mailto:bug-gawk@gnu.org>. This automatically sends a copy to Arnold Robbins. Do not JUST post in comp.lang.awk; Arnold's readership there is sporadic, and of course any Usenet article can be missed, killed, or dropped.

Edit this answer

Is there an easy way to determine if you have oawk or nawk?

you can determine if you have oawk or nawk using the following in a BEGIN rule will do the trick.

        if (ARGC == 0)
                # old awk
        else
                # new awk

Edit this answer

How does awk deal with multiple files?

Version warning

Some of these techniques will require non-ancient versions of awk.

How can awk test for the existence of a file?

The most portable way to test for the existence of a file is to simply try and read from the file.

        function exists(file,        dummy, ret)
        {
                ret=0;
                if ( (getline dummy < file) >=0 )
                {
                        # file exists (possibly empty) and can be read
                        ret = 1;
                        close(file);
                }
                return ret;
        }

[ I've read reports that earlier versions of mawk would write to stderr as well as getline returning <0 -- is this still true? ]

On Unix, you can probably use the `test' utility

        if (system("test -r " file) == 0)
            # file is readable
        else
            # file is not readable

Edit this answer

How can I get awk to read multiple files?

awk automatically reads multiple files (under Unix at least) -- use something like:

    awk '/^#include/ {print $2}' *.c *.h

Edit this answer

How can I tell from which file my input is coming?

the file name is stored in the built-in variable FILENAME:

    awk '/^#include/ {print FILENAME,$2}' *.c *.h

Edit this answer

How can I get awk to open multiple files (selected at runtime)?

You can open files dynamically using `getline', `close', and `print EXPR > FILENAME', like:

    # assumes input file has at least 1 line, output file writeable
    function double(infilename,outfilename,    aline)
    {
      while ( (getline aline < infilename) >0 )
        print(aline aline) > outfilename;
      close(infilename);
      close(outilename);
    }

Edit this answer

How can I treat the first file specially?

you can tell if awk is parsing the first file given on the command line using FILENAME, thusly:

    BEGIN { rulesfile="" }
    rulesfile == "" { rulesfile = FILENAME; }
    FILENAME == rulesfile { build_rule($0); }
    FILENAME != rulesfile { apply_rule($0); }

Example:

Suppose you have a text-line "database" and you want to make some batch changes to it, by replacing some old lines with new lines.

    BEGIN { rulesfile="" }
    rulesfile == "" { rulesfile = FILENAME; }
    rulesfile == FILENAME { replace[$1] = $0; }
    rulesfile != FILENAME \
    { 
            if ($1 in replace) 
                    print replace[$1];
            else
                    print;
    }

Another way, using ARGV:

    (FILENAME == ARGV[1]) { replace[$1] = $0; next }
    ($1 in replace) { print replace[$1]; next }
    { print }

Edit this answer

How can I explicitly pass in a filename to treat specially?

You can use `-v rulesfile=filename' to process a file differently, like you would any other variable, and then use a `getline' loop (and `close') in your BEGIN statement.

    BEGIN \
    {
      if (rulesfile=="")
      {
        print "must use -v rulesfile=filename";
        exit(1);
      }
      while ( (getline < rulesfile) >0 )
        replace[$1]=$0;
      close(rulesfile);
    }

    {
      if ($1 in replace)
        print replace[$1];
      else
        print;
    }

Edit this answer

How many elements were created by split()?

split() で作られたエレメント (配列) の数はいくつでしょうか?

例えば、フィールドを分割した場合、

        split($1,x,"string")

エレメント x の数はどうやって知ることができるでしょうか? (他の方法としては、空文字であるかどうかをテストしたり、`for (n in x)' というテストをしたりもできるでしょう。)

split() は関数ですから、戻り値を使って知ることができます。

        n = split($1, x, "string")

Edit this answer

How can I split a string into characters?

In portable POSIX awk, the only way to do this is to use substr to pull out each character, one by one. This is painful. However, gawk, mawk, and the newest version of the Bell Labs awk all allow you to set FS = "" and use "" as the third argument of split.

So, split("chars",anarray,"") results in the array anarray containing 5 elements -- "c", "h", "a", "r", "s".

If you don't have any ^As in your string, you could try:

        string=$0;
        gsub(".", "&\001", string)
        n=split(string, anarray, "\001")
        for (i=1;i<=n;i++)
            print "character " i "is '" anarray[i] "'";

Edit this answer

Why does SunOS?/Solaris awk behave oddly?

I want to use the tolower() function with SunOS nawk, but all I get is

        nawk: calling undefined function tolower

The SunOS nawk is from a time before awk acquired the tolower() and toupper() functions. Either use one of the freely available awks, or or use /usr/xpg4/bin/awk (if you have it), or write your own function to do it using index, substr, and gsub.

An example of such a function is in O'Reilly's _Sed & Awk_.

Patrick TJ McPhee writes:

> SunOS includes three versions of awk. /usr/bin/awk is the old
> (pre-1989) version. /usr/bin/nawk is the new awk which appeared
> in 1989, and /usr/xpg4/bin/awk is supposed to conform to the single
> unix specification. No one knows why Sun continues to ship old awk.

Edit this answer

How do I have dynamic-width printf strings, like C?

With modern awks, you can just do it like you would in C (though the justification is less clear; C doesn't have the trivial in-line string concatenation that awk does), like so:

        maxlen=0

        for (i in arr)
          if (maxlen<length(arr[i]))
            maxlen=length(arr[i])

        for (i in arr)
          printf("%-*s %s\n",maxlen,arr[i],i)

With old awks, just do it like you would do if you didn't know about %* (this would be much more painful to do in C), like so:

        maxlen=0

        for (i in arr)
          if (maxlen<length(arr[i]))
            maxlen=length(arr[i])

        printfstring="%-" maxlen "s %s\n";
        for (i in arr)
          printf(printfstring,arr[i],i)

Edit this answer