AwkChannelWiki: Largest Accurate Number

Most awk implementations use floating point double precision to represent every kind of numeric value. However, this can cause worry when one is trying to sum up large numbers in very large log files: when is it safe to rely on awk's numbers and when should one shell out to dc or bc for arbitrary precision arithmetic?

The easiest way to investigate loss of accuracy is to find out when some number N is no longer distinct from N+1:

awk 'BEGIN{for (i = 0; i < 64; i++) printf "%s\t%19.0f\t%s\n", i, 2^i, (((2^i+1) == (2^i))? "in" : "") "accurate"}'

This will print out a list of numbers.

The largest reliable value that this process finds for my instance of gawk 3.1.5 running under 32-bit Linux is 2^53-1, with 53 being the 52 size in bits of the mantissa, plus 1 because the precision is still correct with the exponent=1 with a double precision IEEE 754 numbers.

Technical mumbo-jumbo

IEEE 754 double precision floating point numbers are formatted thusly:

1 bit	11 bits	52 bits
sign	exponent	fraction

Note that it says "fraction" above, not "mantissa". This is because the fraction field is interpreted differently in different circumstances.

If all of the exponent bits are 0, the fraction is a 52-bit unsigned integer value. (Unsigned because the sign bit gives the overall sign—yes, this means there's +0 and -0. Thanks, IEEE!) If the exponent field has any non-zero bits, it is assumed that the exponent has been normalized such that the highest bit in the mantissa is 1. Since that highest bit is always 1, there's no need to actually provide it. This means that with an exponent value of 1, you can continue getting precise values for up to 53 bits wide (2^53-1). Starting with an exponent value of 2, however, you lose precision as N and N+1 get encoded into the same representation. So 2^53 and 2^53+1 both encode as the same value. The following table shows the in-memory representation of several illustrative values

value	sign+exponent	fraction
2^51	000	8000000000000
2^52	001	0000000000000
2^53-1	001	FFFFFFFFFFFFF
2^53	002	0000000000000
2^53+1	002	0000000000000

Notice how the last two values are the same (approximately 9007199254740992 in decimal)? Starting with 2^53 you do not know what the actual intended value is going to be. You lose precision.

LargestAccurateNumber

Technical mumbo-jumbo

See also