This material of this faq originates from the comp.lang.awk FAQ that you can find there:
awk is a programming language, named after its three original authors:
they write:
Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data-manipulation tasks.
The title of the book uses `AWK', but the contents of the book use `awk' (except at the beginning of sentences, as above). I will attempt to do the same (except perhaps at the beginning of sentences, as above).
Most implementations of awk are interpreters which read your awk source program and parse it and act on it directly.
Some vendors have developed awk compilers which will produce an executable that may be run stand-alone -- thus, the end user does not have access to the source code. There are also various awk->C converters which allow you to achieve the same functionality (by compiling the resulting C code later).
One of the most popular compilers, from Thompson Automation, continues to be the subject of many positive posts in the group.
I don't really want to start a reviews section, but it may be appropriate. I think it's of general interest, and a good thing for the FAQ, but I don't want to be given any grief by a negative review I didn't write just because I'm distributing it. if you have a review you'd like me to put a pointer to, please inform me -- I already have some pointers of this form listed.
comp.lang.awk is not particularly about sed; for sed discussion, see the sed FAQ (and other documents) for answers to common questions and group recommendations:
this all seems unrelated to AWK Engineering AG at http://www.awk.ch
The examples using quoting are intended for use with any standard (sh-compatible-quoting) Unix shell. As with all complex quoting, all these examples become much easier to work with (or under DOS and MS-Windows, less impossible) when put in a file and invoked with `awk -f filename.awk' instead.
Non-sh-compatible shells will require different quoting. If you're not even using Unix (or a ported Unix shell), just ignore the whole section on quoting.
解答 1:
Unix では「もうひとつのクォート」を使います。例えば、以下のようなものです。
awk -F: '$1 ~ /'"$USER"'/ {print $5}' /etc/passwd ^^^^^^^^*******^^^^^^^^^^^^^^
例えば、標準的な Unix のシェルでは上のアンダーラインの部分は (スペースも込みで) awk にひとつの長い引数として送られます。
$1 ~ /bwk/ {print $5}
クォートされた部分の間のスペースはないかもしれないことに注意してください。一方、シングルクォートで括らない場合には、Unix のシェルは引数をスペースで分割してしまうため、(上の例のように `\' や '' で括ったり "" で括ったりしない限りは) 長いスクリプトの引数となります。
使っている awk のバージョン (この場合にはアップグレードするべきですが、このためだけというわけではありません) によってサポートされている場合以外では、この方法は一般的に避けた方が良いです。 一般的に信頼されていないという問題であり、その結果は使おうとしている実際のシェルの変数の内容に大きく左右されてしまいます。 (comp.lang.awk で議論されている) いくつかの例を上げておきます。
$ var="#" $ awk 'BEGIN{ print '"$var"' }' awk: cmd. line:1: BEGIN{ print # } awk: cmd. line:1: ^ syntax error
上の例は awk プログラムの中でダブルクォートとして使用しているものとして「訂正」されてしまうかもしれません。
$ var="#" $ awk 'BEGIN{ print "'"$var"'" }' #
しかしながら、これで十分というわけではありません。
$ var="hello world" $ awk 'BEGIN{ print "'"$var"'" }' awk: BEGIN{ print "hello awk: ^ unterminated string
そこで -v を使うと動作させることができます。
$ var="hello world" $ awk -v var="$var" 'BEGIN{ print var }' hello world
次に -v を使った解答を見ていきましょう。
解答 2:
自分の awk がコマンドラインでの変数定義をサポートしているかどうかを知るにはマニュアルを読んでください。 例えば、以下のようなものです。
awk -F: -v name="$USER" '$1 ~ name {print $5}' /etc/passwd
解答 3
自分の awk が環境変数にアクセスできるかどうかはマニュアルを読んでください。 多分、以下のようにすれば分かるでしょう。
awk -F: '$1 ~ ENVIRON["USER"] {print $5}' /etc/passwd
ひとつのコマンドで環境変数を簡単に /bin/sh スクリプトで扱えることを覚えておくと便利でしょう。
name=felix age=56 awk '... ENVIRON["name"] .....'
これは ksh でも他のいくつかのシェルでも動作します。
最初の方法は可搬性がありますが、awk で "-f" でスクリプトファイルを指定した場合には動作しません。 こうした場合にはシェルスクリプトを使い、必要であれば長い '...' という部分の awk コマンド引数を複数行に分けると良いでしょう。
/bin/csh は行の折り返しに \ が必要ですが、/bin/sh では不要ということも注意しましょう。
awk でシェルの変数の値を使うことを非常に深く議論したものは [1] を参照してください。
シェルプログラミングの引用符、特に awk での引用符は初心者にとっては頭痛の種です。
Art Povelones は 1999 年 9 月 30 日にシェルでの引用符について長編のチュートリアルを投稿してくれました。これは非常に詳細過ぎるため FAQ で繰り返し使われていないのでしょう。もし、これを使ってみるのであれば、<http://groups.google.com/> から検索してみてください。
Tim Maher は <http://www.consultix-inc.com/quoting.txt> を提案してくれました。
以下の考え方がほとんどの目的を達成する上で多分ベストで、理解しやすく、そしてメンテナンスしやすいものでしょう。('@@' はシェルが文字をそのままコピーすることを確実にするために引用符で囲まれていて、環境変数の置き換えといったようなことをを解釈させることはありません。)
cat <<'@@' > /tmp/never$$.awk { print "Never say can't." } @@ awk -f /tmp/never$$.awk; rm /tmp/never$$.awk
もし、シェルの引用符による挙動をテストすることが多いのであれば、以下のようにしてみてください。
(最初のものの冗長なものとして以下の 7 つの引用符の使い方を用いたものがあります) (上記リンクの 7 つの各項目を参照) awk 'BEGIN { q="'"'"'";print "Never say can"q"t."; exit }' nawk -v q="'" 'BEGIN { print "Never say can"q"t."; exit }' awk 'BEGIN { q=sprintf("%c",39); print "Never say can"q"t."; exit }' awk 'BEGIN { q=sprintf("%c",39); print "Never say \"can"q"t.\""; exit }'
しかし、以下のものが使えないということを知っておくべきです。
awk 'BEGIN { q="\'"; print "Never say \"can"q"t.\""; exit }'
これは 7 項目のクォート方法の例の説明でも述べられていますが、3 つの異なるものでクォートされていることに注意してください。
awk 'BEGIN { q="' "'" '";print "Never say can"q"t."; exit }'
次に、これらの引数が (スペースも含めて) ひとつの文字列として出現しています。
BEGIN { q="'";print "Never say can"q"t."; exit }
これは以下のものと同じです。
BEGIN { q="'"; print "Never say can" q "t."; exit } ^^^^^^^^^^^^^ ^ ^^ | | | || | | | || vvvvvvvvvvvvv | || Never say can v || ' vv t.
努力を惜しまないのであれば可能です。
Never say can't. (できないなんて言わないで)
Modern versions of new awk (gawk, mawk, Bell Labs awk, any POSIX awk) all provide an array named ENVIRON. The array is indexed by environment variable name; the value is that variable's value. For instance, ENVIRON["HOME"] might be "/home/chris". To print out all the names and values, use a simple loop:
for (i in ENVIRON) printf("ENVIRON['%s'] = '%s'\n", i, ENVIRON[i])
What if my awk doesn't have ENVIRON[]?
Short answer, get a better awk. There are many freely available versions.
Longer answer, on Unix you can use a pipe from the `env' or `printenv' commands, but this is less pretty, and may be a problem if the values contain newlines:
# test this on your system before you depend on it! while ( ("env" | getline line) >0 ) { varname=line varvalue=line sub(/=.*$/,"",varname) sub(/^[^=]*=/,"",varvalue) print "var [" varname "]='" varvalue "'" }
How can I put values into the environment of the program that called my awk program?
Short answer, you can't. Unix ain't Plan 9, and you can't tweak the parent's address space.
(DOS isn't even Unix, so it lets any program overwrite any memory location, including the parent's environment space. But the details are [obviously] going to be fairly icky. Avoid.)
Longer answer, write the results in a form the shell can parse to a temporary file, and have the shell "source" the file after running the awk program:
awk 'BEGIN { printf("NEWVAR='%s'\n", somevalue) }' > /tmp/awk.$$ . /tmp/awk.$$ # sh/ksh/bash/pdksh/zsh etc rm /tmp/awk.$$
With many shells, you can use `eval', but this is also cumbersome:
eval `awk 'BEGIN { print "NEWVAR=" somevalue }'`
Csh syntax and more robust use of quotation marks are left as exercises for the reader.
A valid question, since awk is a subset of perl (functionally, not necessarily syntactically); also, the authors of perl have usually known awk (and sed, and C, and a host of other Unix tools) very well, and still decided to move on.
There are some things that perl has built-in support for that almost no version of awk can do without great difficulty (if at all); if you need to do these things, there may be no choice to make. for instance, no reasonable person would try to write a web server in awk instead of using perl or even C, if the actual socket programming has to be written in traditional awk. However, gawk 3.1.0's /inet and ftwalk's built-in networking primitives may remove this particular limitation.
However, there are some things in awk's favor compared to perl:
Tom Christiansen wrote in Message-ID: <3766d75e@cs.colorado.edu>
> Awk is a venerable, powerful, elegant, and simple tool that everyone > should know. Perl is a superset and child of awk, but has much more > power that comes at expense of sacrificing some of that simplicity.
This is described in great detail in the gawk documentation. In brief:
you can determine if you have oawk or nawk using the following in a BEGIN rule will do the trick.
if (ARGC == 0) # old awk else # new awk
Some of these techniques will require non-ancient versions of awk.
The most portable way to test for the existence of a file is to simply try and read from the file.
function exists(file, dummy, ret) { ret=0; if ( (getline dummy < file) >=0 ) { # file exists (possibly empty) and can be read ret = 1; close(file); } return ret; }
[ I've read reports that earlier versions of mawk would write to stderr as well as getline returning <0 -- is this still true? ]
On Unix, you can probably use the `test' utility
if (system("test -r " file) == 0) # file is readable else # file is not readable
awk automatically reads multiple files (under Unix at least) -- use something like:
awk '/^#include/ {print $2}' *.c *.h
the file name is stored in the built-in variable FILENAME:
awk '/^#include/ {print FILENAME,$2}' *.c *.h
You can open files dynamically using `getline', `close', and `print EXPR > FILENAME', like:
# assumes input file has at least 1 line, output file writeable function double(infilename,outfilename, aline) { while ( (getline aline < infilename) >0 ) print(aline aline) > outfilename; close(infilename); close(outilename); }
you can tell if awk is parsing the first file given on the command line using FILENAME, thusly:
BEGIN { rulesfile="" } rulesfile == "" { rulesfile = FILENAME; } FILENAME == rulesfile { build_rule($0); } FILENAME != rulesfile { apply_rule($0); }
Example:
Suppose you have a text-line "database" and you want to make some batch changes to it, by replacing some old lines with new lines.
BEGIN { rulesfile="" } rulesfile == "" { rulesfile = FILENAME; } rulesfile == FILENAME { replace[$1] = $0; } rulesfile != FILENAME \ { if ($1 in replace) print replace[$1]; else print; }
Another way, using ARGV:
(FILENAME == ARGV[1]) { replace[$1] = $0; next } ($1 in replace) { print replace[$1]; next } { print }
You can use `-v rulesfile=filename' to process a file differently, like you would any other variable, and then use a `getline' loop (and `close') in your BEGIN statement.
BEGIN \ { if (rulesfile=="") { print "must use -v rulesfile=filename"; exit(1); } while ( (getline < rulesfile) >0 ) replace[$1]=$0; close(rulesfile); } { if ($1 in replace) print replace[$1]; else print; }
split() で作られたエレメント (配列) の数はいくつでしょうか?
例えば、フィールドを分割した場合、
split($1,x,"string")
エレメント x の数はどうやって知ることができるでしょうか? (他の方法としては、空文字であるかどうかをテストしたり、`for (n in x)' というテストをしたりもできるでしょう。)
split() は関数ですから、戻り値を使って知ることができます。
n = split($1, x, "string")
In portable POSIX awk, the only way to do this is to use substr to pull out each character, one by one. This is painful. However, gawk, mawk, and the newest version of the Bell Labs awk all allow you to set FS = "" and use "" as the third argument of split.
So, split("chars",anarray,"") results in the array anarray containing 5 elements -- "c", "h", "a", "r", "s".
If you don't have any ^As in your string, you could try:
string=$0; gsub(".", "&\001", string) n=split(string, anarray, "\001") for (i=1;i<=n;i++) print "character " i "is '" anarray[i] "'";
I want to use the tolower() function with SunOS
nawk, but all I get is
nawk: calling undefined function tolower
The SunOS
nawk is from a time before awk acquired the tolower() and toupper() functions. Either use one of the freely available awks, or or use /usr/xpg4/bin/awk (if you have it), or write your own function to do it using index, substr, and gsub.
An example of such a function is in O'Reilly's _Sed & Awk_.
Patrick TJ McPhee
writes:
> SunOS includes three versions of awk. /usr/bin/awk is the old > (pre-1989) version. /usr/bin/nawk is the new awk which appeared > in 1989, and /usr/xpg4/bin/awk is supposed to conform to the single > unix specification. No one knows why Sun continues to ship old awk.
With modern awks, you can just do it like you would in C (though the justification is less clear; C doesn't have the trivial in-line string concatenation that awk does), like so:
maxlen=0 for (i in arr) if (maxlen<length(arr[i])) maxlen=length(arr[i]) for (i in arr) printf("%-*s %s\n",maxlen,arr[i],i)
With old awks, just do it like you would do if you didn't know about %* (this would be much more painful to do in C), like so:
maxlen=0 for (i in arr) if (maxlen<length(arr[i])) maxlen=length(arr[i]) printfstring="%-" maxlen "s %s\n"; for (i in arr) printf(printfstring,arr[i],i)