Difference between revision 12 and current revision

Summary: Rollback to 2010-12-20 13:04 UTC

No diff available.

The usual (and correct) answer for backreferences in awk (for example, the answer you can get on #awk for this question) is: "you can't do backreferences in awk". That is only partly true.

If you need to match a pattern using a regular expression with backreferences, like eg you do in sed

sed -n '/\(foo\)\(bar\).*\2\1/p'  # prints lines with "foobar" and "barfoo" later in the line

or similar things, then well, you can't do that easily with awk.

But if you are using backreferences during string substitution, to insert text previously captured by a capture group, then you will almost certainly be able to get what you want with awk. Following are some hints:

# reverse letter and following digit and insert "+" if letter is "a" or "c"
$ echo 'a1-b2-c3-a5-s6-a7-f8-e9-a0' | gawk '{print gensub(/([ac])([0-9])/,"\\2+\\1","g",$0)}'

Note that gensub(), unlike sub() and gsub(), returns the modified string without touching the original. Also note that the third parameter is much like sed's match number specification in the s/pattern/replacement/ command: it can either be a number, indicating to replace only that specific match, or the string "g" (as in the example), to indicate replacement of all matches. See the gawk manual for more information (including why backslashes must be escaped in the replacement text).

echo 'foo123bar' | sed 's/.*\([0-9]\{1,\}\).*/\1/'
echo 'blah <a href="">blah blah</a>' | sed 's/.*"\([^"]*\)".*/\1/'

Both things can be done in awk (and sed as well!) without the need of backreferences. You just delete the part of the line you don't need:

awk '{gsub(/^[a-z]*|[a-z]*$/,"");print}'   # 1st example
awk '{gsub(/^[^"]*"|"[^"]*$/,"");print}'   # 2nd example

Generally speaking, however, the above methods (both sed and awk) require that you have only one matching substring to extract per line. For the same purpose, with some awks (see AwkFeatureComparison), you can use the possibility to assign a regexp to RS to "pull out" substrings from the input (and without the limitation of at most one match per line). See the last part of Pulling out things for more information and examples.