(redirected from faq)

Frequently Asked Questions

Some entries of this page have been copied from the comp.lang.awk_FAQ(Credits)


How do I print a RangeOfFields, eg from field 2 to the end?

Printing a range of fields - all fields but the first, for examples, or fields 3 through 8 - is a surprisingly fiddly little problem.

No field offsets are stored

Although awk performs field splitting, it does not maintain a record (or at least not that is accessible to user code) of the offsets into the line where the splitting actually took place.

This means that you can, for example, assign an empty string to the fields preceding and following the range that you want ( {$1 = $2 = ""; for (i = 9; i <= NF; i++) $i = ""; print} ).

This will however cause awk to recompute the line adding OFS between the field, including the now empty fields at the beginning and at the end, in the case of the above example with a default FS and OFS, the spaces will be squeezed and 2 space will be present in front of all the line. You can remove the front space with $0=substr($0, 1+length(OFS) * 2) , another possibility is to shift the fields and adjust NF eg to keep the fields 3 to 8:

for (i=3;i<=8;i+=1)) $(i-2)=$i # shift $1=$3 $2=$4 ...
NF=8-3+1 # only keep the first six fields
print

Drawbacks of using a loop

In some cases, the desired behaviour is to remove everything before a certain field and then print out the rest of the line. The typical tactic is then to use a for loop, such as this:

awk '{sep="";for (i=2;i<=NF;i++) {printf "%s%s",sep, $i;sep=" "}; printf "\n"}' file

# or, avoids using sep
awk '{for (i=2;i<=NF;i++) {printf "%s%s",(i>2?" ":""), $i}; printf "\n"}' file

# uses OFS and ORS
awk '{for (i=2; i<=NF; i++) printf("%s%s", $i, i==NF ? ORS : OFS)}' file

A loop to select individual fields will also cause anything that appears between the fields to be replaced with " ". This is often not the desired behaviour.

Using sub()

If the separator is the default, you can use a direct sub() on $0 to remove the fields that you don't want. This has the advantage that original spacing between fields will be preserved. For example:

awk '{sub(/^[[:blank:]]*([^[:blank:]]+[[:blank:]]+){n}/,"")}' file

will remove the first "n" fields from the line ("n" must be replaced by the actual number of fields that you want to remove). If you want to remove the last n fields with the same technique, then

awk '{sub(/([[:blank:]]+[^[:blank:]]+){n}[[:blank:]]*$/,"")}' file

should do the job. If you want to keep from field n to field m (meaning "remove from field 1 to n-1 and from m+1 to NF"), just combine the above two techniques, using the appropriate values for the repetition operator.

Keep in mind that the {n} repetition operator in regexes, while specified by POSIX, is not supported by all implementations of awk. With GNU awk, you have to use the --re-interval command line switch (or --posix to get full POSIX compatibility, which includes repetition operators).

If FS is not the default, but it's still a single character (for example, "#"), it is simpler and you can do something like

awk '{sub(/^([^#]*#){n}/,"")}' file

the example that removes the last n fields can be adapted similarly from the code that uses the default FS.

Finally, if FS is a full regular expression, then the problem is not trivial and it's better to use some other technique among those described here.

Using cut

If the field separator is a single character, the cut utility may be used to select a range of fields. There is documentation for the GNU version of cut, which appears in the coreutils package, but which leaves out mention of the range feature. The Open Group spec describes cut in more detail, and the Examples section offers a more detailed guide to the syntax.

cut cannot use field delimiters that are longer than a single character. By default, the delimiter is the tab character. To select fields one, three, and from field 9 to the end of the line, one could write: cut -f1,3,9- . Fields cannot be reordered in this way: cut -f3,1,9- is equivalent to the previous example.

If a single-character delimiter limitation is not a restriction, and if the fields do not need to be reordered, and if no other computation needs to take place, consider using cut instead of awk: it is small, simple and fast, and this simplicity makes its purpose immediately clear to anyone reading the command line.

With gawk-devel's optional fourth parameter of split()

To print from the third to the last field with gawk-devel's split(), when the field separator is a full regular expression:

awk '
 {
   nf = split($0, fld, fs_regex, delim)
   for (i = 3; i <= NF; ++i)
     printf "%s%s", fld[i], ((i < NF) ? delim[i] : "\n")
 }'

The fourth (optional) argument delim of split() is an array where delim[i] gets filled with the delimiter string between fld[i] and fld[i+1]. fld[0] is the delimiter prefix and fld[nf] the delimiter suffix, if the field separator is " " (a blank); for a regular expression field separator fld[0] and fld[nf] don't exist.

Function using match() and substr()

This approach uses match() to get the position and length of each field separator, then substr() to either trim it from the beginning or add that field and it's succeeding separator to an output string. Afterwards, it appends the last field without the separator to the output string and returns said string.

# usage: extract_range(string, start, stop)
# extracts fields "start" through "stop" from "string", based on FS, with the
# original field separators intact. returns the extracted fields.
function extract_range(str, start, stop,     i, re, out) {
  # if FS is the default, trim leading and trailing spaces from "string" and
  # set "re" to the appropriate regex
  if (FS == " ") {
    gsub(/^[[:space:]]+|[[:space:]]+$/, "", str);
    re = "[[:space:]]+";
  } else {
    re = FS;
  }

  # remove fields 1 through start - 1 from the beginning
  for (i=1; i<start; i++) {
    if (match(str, re)) {
      str = substr(str, RSTART + RLENGTH);

    # there's no FS left, therefore the range is empty
    } else {
      return "";
    }
  }

  # add fields start through stop - 1 to the output var
  for (i=start; i<stop; i++) {
    if (match(str, re)) {
      # append the field to the output
      out = out substr(str, 1, RSTART + RLENGTH - 1);

      # remove the field from the line
      str = substr(str, RSTART + RLENGTH);

    # no FS left, just append the rest of the line and return
    } else {
      return out str;
    }
  }

  # append the last field and return
  if (match(str, re)) {
    return out substr(str, 1, RSTART - 1);
  } else {
    return out str;
  }
}

# example use to print $3 through the end: awk '{print extract_range($0, 3, NF)}'

Using index(), substr(), length() and printf()

This approach finds the length of each field, starting from the end of the last field (meaning that any whitespace is included). It then uses printf to space-pad the field to that length, therefore preserving the original whitespace. It works best when the fields are separated by spaces, although it could be adapted for other field separators. Note that tabs will be treated as one character, and replaced with a single space.

# You could hardcode the numbers, replacing "s" and "e" in the code.
# If you wanted "3 to the end", as above, simply replace "e" with "NF".
awk -v s="$start" -v e="$end" '
{
  # the ending offset of the last field, from the beginning, is stored in "prev"
  prev = 0;
  # first, just add the lengths of fields 1 through s - 1 to prev
  for (i=1; i<s; i++) {
    prev += index(substr($0, prev + 1), $i) + length($i) - 1;
  }

  # add the space between s-1 and s to prev
  prev += index(substr($0, prev + 1), $s) - 1;

 # loop over the fields we want to print
  for (i=s; i<=e; i++) {
    # get the length, from the end of the last field to the end of the current
    len = index(substr($0, prev + 1), $i) + length($i) - 1;

    # print the field, padded to that length
    printf("%*s%s", len, $i, i==e ? "\n" : "");

    # add the length to "prev"
    prev += len;
  }
}'

# as a (granted, long) one-liner, printing 3 through the end
awk '{p=0; for (i=1;i<3;i++) p+=index(substr($0,p+1),$i)+length($i)-1; p+=index(substr($0,p+1),$3)-1; for (i=3;i<=NF;i++) {p+=l=index(substr($0,p+1),$i)+length($i)-1; printf("%*s%s",l,$i,i==NF?"\n":"")}}'

An advantage to this method is that you could also use it to process/change fields in a table, and keep the format pretty much the same. The following script doesn't actually remove any fields from the output, but allows you to change the second field and still pad everything the same way:

#!/usr/bin/awk -f

# store the length of each field from the end of the previous
{
  prev = 0;
  for (i=1; i<=NF; i++) {
    lens[i] = index(substr($0, prev + 1), $i) + length($i) - 1;
  }
}

# do processing, reassignments, whatever here
{
  $2 = "new";
}

# print fields, padded appropriately
{
  for (i=1; i<=NF; i++) {
    printf("%*s%s", lens[i], $i, i==NF ? "\n" : "");
  }
}

Some other approaches

Eric Pement put together a short list of tactics.

Edit this answer


How do I print the LastField or the n'th field in a record?

awk performs a number of actions automatically when it parses lines: it updates the variable NF, which contains the number of fields on a line; and it parses the record into a series of fields which are accessible via the variables $1, $2, $3 and so on. The variable $0 contains the entire line.

Though you might consider $1 as a variable, it's not exactly true, in fact $ is the field reference operator and 1 is just a number that tells awk you want to reference the first field. They behave a bit like an array ie where with an array you would write fields[1] in awk you write $1

You can replace 1 by an expression, thus $(10-9) also refers to first field. Since the variable NF contains the number of fields on a line, and since fields are indexed starting from 1, $(NF) (or just $NF) contains the last field in any given record.

For those who won't take the time to read this whole faq

 print $1      # prints the first field
 print $(10-9) # again the first field 
 i=1;print $i  # yes it prints the first field
 print $NF     # prints the last field
 print $(NF-1) # prints the field before the last one

(Note that you can assign these fields, but that's another story

See the GNU awk manual entries for..:

Edit this answer


I'm trying to print a number, why do I get 1e+06 instead of 1000001.10?

Use printf and a format string instead of print, some examples:

  BEGIN {
    printf "%f", 1000001.10 #prints 1000001.100000
    printf "%.3f", 1000001.10 #prints 1000001.100
    printf "%ld",1000001000000001} #prints 1000001000000001
  }		

For more information about printf see the gnu awk manual

But why does this happen in the first place? awk does something like printf using the format string in the variable OFMT, which contains %.6g by default, when it has to print a number:

  $ echo 12.123123124 | awk '{print $1;print $1+0;OFMT="%.5g";print $1+0;}'
  12.123123124 # here it is printed as a string without conversion
  12.1231      # same as printf "%.6g",$1 ($1+0 is a number)
  12.123       # same as printf "%.5g",$1

Take care that in this example $1 is considered as a string by default, while it would be considered as a number in a boolean expression see truth

There is also another conversion that happens when a number is transformed into a string but not by print, this conversion is controlled by CONVFMT, which is also "%.6g" by default.

  $ echo 12.123123124 | awk '{CONVFMT="%.4g";print ($1+0);print ($1+0) ""}'
  12.1231 # formatted by OFMT ie "%.6g"
  12.12   # it's first converted to a string according to CONVFMT

Edit this answer


How do I edit a file in place with awk?

You cannot edit a file in place with awk. What you should do is direct your output to a temporary file, and, if everything is fine, rename the temporary file:

awk '{do whatever}' originalfile.txt > tmpfile.txt && mv tmpfile.txt originalfile.txt

If you have special requirements, you can of course use something more sophisticated like mktemp to create the temporary file.

(and, by the way, sed or perl, when given the -i option, create a temporary file behind the scenes and then rename it anyway).

If you are brave and are happy at the idea of losing your data in case of a crash happening while the program is running, you can do something like this:

awk '{do whatever;line[NR]=$0}
END{close(ARGV[1])
    for(i=1;i<=NR;i++){
      print line[i] > ARGV[1]
    }
   }' originalfile.txt

Edit this answer


How do I use a variable as a regular expression?

The patterns between slashes like /pattern/ are called ERE constants, or regular expressions literals. As the names imply, they can only contain fixed, constant regular expressions. If you have a variable var that contains "abc(123)?r+" and try to match something against /var/, you are matching against the literal string "var", not against the regular expression. You can still use strings in places where regular expressions are expected, like this:

var="abc(123)?r+"
if ($1 ~ var){ # $1 matches, do something }

or

BEGIN{var="abc(123)?r+"}
$0 ~ var { # $0 matches, do something }

Also note that when you're using a string as a regular expression you must explicitly match it against the string you want to check; you can NOT use the string alone and expect awk to understand that you mean $0 ~ string, as happens instead for RE literals. Finally, using a string as a regex produces what's called a "computed" or "dynamic" regex. For a detailed discussion of computed regexes and the issues you should be aware of when using them, see the GNU awk manual.

Edit this answer


How do I pass a shell variable to awk?

The common solution is to use the -v option to define an awk variable giving it the value of the shell variable:

# correct quoting for a bourne like shell:
shellvariable=foo
awk -v awkvar="$shellvariable" 'BEGIN{print awkvar}' 

If you want to pass a pattern as a variable take care that the pattern is a string, so the \ are interpreted twice(ie "\." define the string '.'), while they are only intrepreted once within / /.

#version using a constant
awk '/foo\./{print}'
#version with a variable
pattern='foo\\.'  
awk -v pattern="$pattern" '$0 ~ pattern{print}'

If your variable is an environment variable then you can access it using the ENVIRON array:

export FOO=bar
awk 'BEGIN{print ENVIRON["FOO"]}'

If this is not enough have a look at the comp.lang.awk FAQ

Edit this answer


How do I pass an array to awk?

  • You can use split to create an array from a string:
  awk -v list='foo,bar,baz' '
    BEGIN {
      n=split(list, array, /,/)
      # now: array[1] == "foo", array[2] == "bar", ...
      for (idx in array) 
        map[array[idx]] = k
    } 
    $0 in map { ... }'
  • If you want to compare two files with awk (doesn't work if file1 is empty). The following code snippet passes an array via file1:
   awk '
       # cmp as awk program
       NR == FNR { array[NR] = $0; next } 
       !(FNR in array && $0 == array[FNR]) { result = 1; exit } 
       END { exit (NR != 2 * FNR || result + 0) }
  ' file1 file2

With gawk one could use ARGIND == 1 instead of NR == FNR, which is working also for an empty file file1.

For an explanation of this technique see: ComparingTwoFiles

Edit this answer



Why "print $variable shows" nothing? why "print "hello $name"" doesn't work?

A '''variable''' is a symbolic name associated with a [[value?]]. A variable acts as a container and the [[value?]] it contains may be changed from within a running [[program?]], enabling data manipulation to take place from within the [[script?]].

Variables are dynamic

In awk variables are [[dynamic?]] and can hold either [[numeric?]] or string values.

Variables do not need predefinition prior to use

In awk, there is no need to declare or initialize variables before they are used. By default, variables are initialized to the [[empty?]] string, which evaluates to [[zero?]] when [[convert?]]ed to a number.

Initialization within a begin block is possible

It is possible to initialize variables in a BEGIN block to make them obvious and to make sure they have proper initial values.

Variable names

As in most programming languages, the name of a variable must be a sequence of [[letter?]]s, [[digit?]]s, or [[underscore?]] symbols, and may not begin with a [[digit?]]. The awk interpreter is [[case_sensitive?]]. This means that variable names that have different letter cases are distinct and separate from each other:

 # The identifiers dog,Dog and DOG represent separate variables
 BEGIN {
	dog = "Benjamin"
	Dog = "Samba"
	DOG = "Bernie"
	printf "The three dogs are named %s, %s and %s.\n", dog, Dog, DOG
 }

Special variables

Some names are used for special variables.

Variables in awk do not need a sigil

In awk, variables are referenced using only the variable name and no [[sigil?]] prefix is used before the variable name:

awk '{var="foo";print var}'

Variable names inside string constants are not expanded in awk

Secondly, awk does not behave like the Unix [[shell?]]. Variables inside string constants are not expanded. The [[interpreter?]] has no way to distinguish words in a string constant from variable names, so this would never be possible. So "hello name" is a constant string, regardless of whether name is a variable in the AWK script. New strings can be constructed from string constants and variables, using concatenation:

# print the concatenation of a string and a variable directly
print "hello " name;

# concatenate, assign to 'newstr' and print that
newstr = "hello " name;
print newstr;

If the print statement is given several arguments (that is, they are separated by ","), it prints them separated by the OFS variable.

So, presuming OFS is " ", the following is the equivalent to the first example above:

print "hello", name;

Edit this answer


How do I find the length of an array?

Posix does not define a way to get the length of an array, while you could use a loop to count the elements the usual strategy is to keep track of the length yourself.

   #using a counter
   a[n++]="foo"
   a[n++]="bar"
   printf "the length of a is %d\n",n 

   #remember that split returns a value
   n=split("foo bar",a)
   printf "the length of a is %d\n",n 


   # loop on the elements of a
   # you don't always no need to know the length!
   for (i in a) print a[i] 

Some awk implementations (gnu awk, true awk) allow you to use length on an array see AwkFeatureComparison. Up to now (ie gawk 3.1.6) you cannot use length on an array passed as an argument to a function:

#!/usr/bin/gawk -f
function foo(array){
    # does not work! you need to pass the length as an extra argument
    printf "In a function, the length of array is %d\n", length(array)
}
BEGIN{
    array[1]="foo";array[2]="bar"
    printf "the length of array is %d\n", length(array)
    foo(array)
}

The above code results in:

the length of array is 2
gawk: ./length.gawk:3: fatal: attempt to use array `array (from array)' in a scalar context

This problem is fixed in the gawk-stable CVS version available from savannah.gnu.org.

Edit this answer


How do I remove the newlines?

"print" prints a newline by default. If you don't want a newline, you can use printf instead it is straightforward, just remember to use a format string and avoid putting data in it.

  printf "%s",$0 #prints the record without adding a newline

If you want to join the lines with another characters you can do something like:

   awk '{printf "%s%s",separator,$0;separator="|"}END{printf "\n"}'

"print" does print a newline by default, but that's not the whole truth, in fact print adds the character in ORS, so you can also change ORS to "remove" the newlines

  printf "%s\n" foo bar | awk -v ORS="|" '{print $0}'

The "drawback" of this method is that a trailing ORS is always added.

Edit this answer


How do I use backreferences in awk?

The usual (and correct) answer for backreferences in awk (for example, the answer you can get on #awk for this question) is: "you can't do backreferences in awk". That is only partly true.

If you need to match a pattern using a regular expression with backreferences, like eg you do in sed

sed -n '/\(foo\)\(bar\).*\2\1/p'  # prints lines with "foobar" and "barfoo" later in the line

or similar things, then well, you can't do that easily with awk.

But if you are using backreferences during string substitution, to insert text previously captured by a capture group, then you will almost certainly be able to get what you want with awk. Following are some hints:

  • First and easiest answer (requires GNU awk): use gensub(). It supports backreferences natively. Example:
# reverse letter and following digit and insert "+" if letter is "a" or "c"
$ echo 'a1-b2-c3-a5-s6-a7-f8-e9-a0' | gawk '{print gensub(/([ac])([0-9])/,"\\2+\\1","g",$0)}'
1+a-b2-3+c-5+a-s6-7+a-f8-e9-0+a

Note that gensub(), unlike sub() and gsub(), returns the modified string without touching the original. Also note that the third parameter is much like sed's match number specification in the s/pattern/replacement/ command: it can either be a number, indicating to replace only that specific match, or the string "g" (as in the example), to indicate replacement of all matches. See the gawk manual for more information (including why backslashes must be escaped in the replacement text).

  • Second answer: sometimes you don't really need backreferences, since what you want can be accomplished without. Examples:
echo 'foo123bar' | sed 's/.*\([0-9]\{1,\}\).*/\1/'
echo 'blah <a href="http://some.site.tld/page1.html">blah blah</a>' | sed 's/.*"\([^"]*\)".*/\1/'

Both things can be done in awk (and sed as well!) without the need of backreferences. You just delete the part of the line you don't need:

awk '{gsub(/^[a-z]*|[a-z]*$/,"");print}'   # 1st example
awk '{gsub(/^[^"]*"|"[^"]*$/,"");print}'   # 2nd example

Generally speaking, however, the above methods (both sed and awk) require that you have only one matching substring to extract per line. For the same purpose, with some awks (see AwkFeatureComparison), you can use the possibility to assign a regexp to RS to "pull out" substrings from the input (and without the limitation of at most one match per line). See the last part of Pulling out things for more information and examples.

  • Third answer: see GeneralizedTextReplacement for a detailed discussion of a framework for generalized text replacement, including an explanation on how to emulate backreferences (and much more) with awk.

Edit this answer


How do I PrintASingleQuote character?

This question gets asked often enough that it deserves its own answer. This common question doesn't actually point to a shortcoming of awk: rather, it is almost always due to the way that shell quoting interacts with the singlequote character.

The Short Story

Use octal escape sequences ('\047') or printf ('printf "%c", 39'). Do not use hex escape sequences ('\x27') because they can interact badly with the surrounding text in different ways depending on your awk implementation.

The Rambling Tale

In order to print out the string "it said 'Hello, World!' and then returned 0", one can run the following program:

BEGIN {
	print "it said 'Hello, World!' and then returned 0"
	exit 0
}

However, when one attempts something similar on the command line:

awk 'BEGIN{print "it said 'Hello, World!' and then returned 0";exit 0}'

...the shell complains, because it tries to parse "Hello, World!" as a string of commands to be inserted between two singlequoted strings.

The first thought one might have is to surround the program fragment in double quotes rather than single quotes, but this interacts very badly with awk's literal string syntax and the "$" field reference operator.

Hex Escapes: Bad Juju

Frustratingly, the next most obvious solution - using hex-escaped characters - seems to work at first:

awk 'BEGIN{print "it said \x27Hello, World!\x27 and then returned 0";exit 0}'

...but this consistency is a sham. Try the following fragment instead in gawk, mawk and busybox awk and compare the results:

awk 'BEGIN{print "\x27foo!\x27"}'

Note that mawk and busybox awk print the expected string, but that gawk returns a [[multibyte?]] character. As mentioned in paragraph 3 of the Rationale section of the Open Group Base Specifications issue 6, and as reiterated in the GNU awk manual in section 2.2 ("Escape Sequences"), the '\xHH' hexadecimal notation is ambiguous because it allows more than two successive hex bytes to be specified. Unfortunately the precise behaviour when more than two bytes are given is allowed to be implementation dependent.

Octal Escapes: Great Personality, but...

Fortunately we can always regress to stone knives and bearskins: octal escape sequences are required to have a fixed length.

awk 'BEGIN{print "\047foo!\047"}'

Uses and Abuses of printf

Or we could use printf:

awk 'BEGIN{printf "%cfoo!%c\n", 39, 39}'

...but then we have to start counting to make sure that all the escape sequences have a corresponding number. gawk features a printf extension for re-using printf arguments according to a position specified in the string sequence..:

awk 'BEGIN{printf "%1$cfoo!%1$c\n", 39}'

...but that compromise is far too ugly for polite company, so let's pretend we didn't mention it.

Explicit Concatenation (oh my!)

There is also the old fallback of putting a single quote character in its own variable and then using explicit string concatenation:

awk 'BEGIN{q="\047";print q"foo!"q}'

...but that gets ugly when dealing with a long string that contains many single quote characters.

Being Creative

Other ways include: escaping the single quote in the shell ('\'') and writing the hex character at the end of a string:

awk 'BEGIN{print "it said '\''Hello, World!'\'' and then returned 0"}'
awk 'BEGIN{print "it said \x27""Hello, World!\x27"" and then returned 0"}'

Do The Right Thing

The cleanest way is simply to write the program in its own file. There may also be shell-specific ways for working around the quoting problem: please feel free to add them to this page if you know any.

Feed the quote as a variable to awk

Another way is to provide the quote to awk as a variable:

--single quote

awk -v q="'" 'BEGIN{print "it said " q "Hello, World!" q " and then returned 0"}'

--double quote

awk -v q='"'  'BEGIN{print "it said " q "Hello, World!" q " and then returned 0"}'

Using bash's quoting($'string')

awk $'BEGIN{print "it said \'Hello, World!\' and then returned 0";exit 0}'

Edit this answer


How do I find the LargestAccurateNumber that my awk can use?

Most awk implementations use floating point double precision to represent every kind of numeric value. However, this can cause worry when one is trying to sum up large numbers in very large log files: when is it safe to rely on awk's numbers and when should one shell out to dc or bc for arbitrary precision arithmetic?

The easiest way to investigate loss of accuracy is to find out when some number N is no longer distinct from N+1:

awk 'BEGIN{for (i = 0; i < 64; i++) printf "%s\t%19.0f\t%s\n", i, 2^i, (((2^i+1) == (2^i))? "in" : "") "accurate"}'

This will print out a list of numbers.

The largest reliable value that this process finds for my instance of gawk 3.1.5 running under 32-bit Linux is 2^53-1, with 53 being the 52 size in bits of the mantissa, plus 1 because the precision is still correct with the exponent=1 with a double precision IEEE 754 numbers.

Technical mumbo-jumbo

IEEE 754 double precision floating point numbers are formatted thusly:

1 bit11 bits52 bits
signexponentfraction

Note that it says "fraction" above, not "mantissa". This is because the fraction field is interpreted differently in different circumstances.

If all of the exponent bits are 0, the fraction is a 52-bit unsigned integer value. (Unsigned because the sign bit gives the overall sign—yes, this means there's +0 and -0. Thanks, IEEE!) If the exponent field has any non-zero bits, it is assumed that the exponent has been normalized such that the highest bit in the mantissa is 1. Since that highest bit is always 1, there's no need to actually provide it. This means that with an exponent value of 1, you can continue getting precise values for up to 53 bits wide (2^53-1). Starting with an exponent value of 2, however, you lose precision as N and N+1 get encoded into the same representation. So 2^53 and 2^53+1 both encode as the same value. The following table shows the in-memory representation of several illustrative values

valuesign+exponentfraction
2^510008000000000000
2^520010000000000000
2^53-1001FFFFFFFFFFFFF
2^530020000000000000
2^53+10020000000000000

Notice how the last two values are the same (approximately 9007199254740992 in decimal)? Starting with 2^53 you do not know what the actual intended value is going to be. You lose precision.

See also

What are floating point numbers?

Not all numbers can be represented accurately using floating point

Edit this answer


Why would anyone still use awk instead of perl?

A valid question, since awk is a subset of perl (functionally, not necessarily syntactically); also, the authors of perl have usually known awk (and sed, and C, and a host of other Unix tools) very well, and still decided to move on.

There are some things that perl has built-in support for that almost no version of awk can do without great difficulty (if at all); if you need to do these things, there may be no choice to make. For instance, no reasonable person would try to write a web server in awk instead of using perl or even C, if the actual socket programming has to be written in traditional awk. However, gawk 3.1.0's /inet and ftwalk's built-in networking primitives may remove this particular limitation.

However, there are some things in awk's favor compared to perl:

  • awk is simpler (especially important if deciding which to learn first)
  • awk syntax is far more regular (another advantage for the beginner, even without considering syntax-highlighting editors)
  • you may already know awk well enough for the task at hand
  • you may have only awk installed
  • awk can be smaller, thus much quicker to execute for small programs
  • awk variables don't have `$' in front of them :-)
  • clear perl code is better than unclear awk code; but NOTHING comes close to unclear perl code

Tom Christiansen wrote in Message-ID: <3766d75e@cs.colorado.edu>

  > Awk is a venerable, powerful, elegant, and simple tool that everyone
  > should know.  Perl is a superset and child of awk, but has much more
  > power that comes at expense of sacrificing some of that simplicity.

Edit this answer


Why does SunOS?/Solaris awk behave oddly?

I want to use the tolower() function with SunOS nawk, but all I get is

        nawk: calling undefined function tolower

The SunOS nawk is from a time before awk acquired the tolower() and toupper() functions. Either use one of the freely available awks, or or use /usr/xpg4/bin/awk (if you have it), or write your own function to do it using index, substr, and gsub.

An example of such a function is in O'Reilly's _Sed & Awk_.

Patrick TJ McPhee writes:

> SunOS includes three versions of awk. /usr/bin/awk is the old
> (pre-1989) version. /usr/bin/nawk is the new awk which appeared
> in 1989, and /usr/xpg4/bin/awk is supposed to conform to the single
> unix specification. No one knows why Sun continues to ship old awk.

Edit this answer


What is a PasteBin?

A PasteBin is a site that allows one to paste chunks of code or text. The snippet is given a unique URL that may or may not be permanent; some advertise new pastes to IRC via one or more in-channel bots.

It is considered good etiquette to paste code or text into a PasteBin rather than directly into IRC to avoid interrupting conversation, to facilitate versioning, and to make it easy for people wanting to reconstruct test cases to copy and paste code without removing extraneous text inserted by IRC clients.

List of Channel PasteBins

There is a lengthy ListOfPastebins from which to choose.

Edit this answer