FileAndBitJoinClone

Last edit

Summary: Added idiomatic NR==FNR{}{} solution

Added:

> Note that the problem might also be solved using the classical awk idiom to process two files:
> {{{
> awk 'NR==FNR{BL[$1]=$0;next}$3 in BL{print BL[$3]}' bit.txt file.txt
> }}}
> For more information on awk idioms (including the previous one), see [[http://awk.freeshell.org/AwkTips#toc1|Typical awk idioms]]


Problem Description

This was difficult to hash out. Eventually the task was described as, "for each value in $3 of file.txt, print out the line from bit.txt where bit.txt:$1 == file.txt:$3".

Since $1 of bit.txt is always equal to NR - that is, the first field on each line of bit.txt is simply the line number - the problem becomes quite easy.

The Code

#!/usr/bin/awk -f

BEGIN {
	BIT  = "bit.txt"
	FILE = "file.txt"
	ERRSTR = "Guarantee violated, no value for %s in %s (%s, line %s)\n"

	while ((getline $0 < BIT) > 0)
		BL[$1] = $0
	close(BIT)

	while ((getline $0 < FILE) > 0)
		print (($3 in BL) ? BL[$3] : sprintf(ERRSTR, $3, BIT, FILE, $1))
	close(FILE)

# equivalently in terms of semantics:
#
#  sort -nk 3,3 file.txt | join -1 3 -2 1 -e "EMPTY" - bit.txt
#
# Note that the above command assumes that bit.txt is already sorted on the first
# field. Since that first field is supposed to be the line number, this assumption
# is more or less reasonable; however, in the interest of avoiding stupid errors,
# this should be checked. And then double-checked.
#
# The output format will be different, of course, but this is a matter of messing
# around with the '-o FORMAT' option of the GNU join command, which is documented
# at http://www.gnu.org/software/coreutils/manual/coreutils.html#join-invocation.
}

Note that the problem might also be solved using the classical awk idiom to process two files:

awk 'NR==FNR{BL[$1]=$0;next}$3 in BL{print BL[$3]}' bit.txt file.txt

For more information on awk idioms (including the previous one), see Typical awk idioms

The Input Files

Bas64-encoded; simply run wget -qO- http://awk.freeshell.org/FileAndBitJoinClone | sed -ne '/^begin-base64/,/^====$/p' | uudecode -o- | tar -zxv to create the two files 'file.txt' and 'bit.txt' in the current directory.

begin-base64 644 ajai-files.tar.gz
H4sICIbrakgCA2FqYWktZmlsZXMudGFyAO2aa67cyLGE9Xe0it7AEJWZ9VyO
DdiGAT+Ae8eAl+9kRUqoJMP2rzFsQA1IOmKcJqOCxeBX7P7tH3+5fvn7L19+
zVfxV6/1/lf9z/mvv8RaL19Eimnrpr19KSLD7MunfPkPvP72/7/85v8+ny9/
+Mtf//zXv/zz3/t3+v/oq/zkL/z1VY6f9fjZjp/r8XM7fu7Hz+P4eR4/r/NY
6cDnkeU8tJzHlvPgch5dzsPLeXw5DcjpQE8HmsZ+OtDTgZ4O9HSgpwM9Hejp
QE8Hdjqw04Gl+E8Hdjqw04GdDux0YKcDOx3U00E9HdTTQU0zYDuYa/8jbX6t
p4t6uqini3q6qKeLVh57bKeTdjpp9vzVNCHbUz3dtAHVvquno3Y66uXxq307
GnNhQ/vaT1fdnr9+uurbVYMXUfva+3PDmVM/XfWFPeu3PY/zjI3tCpn4bobm
/55nbZyORuQk3/eaLt/tRjAdavs65uP/Z1Lz9DPP8zbPhCYSwqV5H3FiFsn3
gU14Ku37htPTxLnbb1L9Os+M5ulnlfSL63S0TkfrTGed6aynk3U6Wee5Wqne
1uN9UlLJle2lY7JIGb4lVV1JXVdS2WGn6b39tSX1XknFV7Y3HXFlqFduLuDU
wLmCcwfnEs4tnGsYJ7tjY/c4chejjAVVZ+a9X54b8m0hmUItV5xoa74h+Urd
LKmcBe18vjX5ShUtqaMllbRYvlOlmFJPSypqSU0tqaoldbWkspbU1pLqWlJf
S823zeQl1bWkvpZU2JIaW1JlS0teUmELGrvD0lq+Idlp+Tae7KTClpbspLqW
1NfSk52e7PSnnZ7soKtxWZv4/zNY9IeaLKWuFpT1cSCU9bkhGRtPY6mxBRfr
Kbfnhow94yknd6m2JfW2pOKW1Nwyk6mZTt5MWaGxFe0xu2/IJJbspNaWleyg
tY8dxf0PLVX9NKT2lrjhYxTDy20lX+jvU06+UOKnnDGxZFnR4+eGxIupxjXV
uJaEjCUxY6pvTfWtJTlCdbcVt1DfkOgVsZ2yPTckU+jwU06+Ek4rOvz85fXY
kLkaHX7KT3f6dIcmPzc8DWbYRp+fcspOnwZTpas9DVqm/6c7e7pL3a7o9lNO
8aV611TvmupdU71rqneteU2Szmaqd61PStBU8YqKT3rylGpeU81rAnNNHa8t
L5Se9KLt5St1vaau19T1mrpeU9drwnJNRa8JyrXn1VuaUankNZW89uQl4bim
eteRvKRu14TjOpKXkZeSyUuqdE2VrqnSNVW6pkrXVOmaKl1TpWuCcJ15XZu8
pD7X1OeaKFwThmtqck0grqnGNWG4pg5XdPjB3rrySjsttUtaa6f6tlTflurb
ynNlYKjwtCWtvMt86ckXqjxI4JZR5eeG5C6xuKHHh31fVVnicUORJz25S0Bu
6emIpRo3zU8n9Lnf9JTEEotbYnFL3W3pSYml4rb8rAStfR4zPzGxl6f83MRe
WeWnJ/bKKj9DQXMnPfmrL3+pwe1ocKj5qU5KLHW41Zez1OGWMN3Q3+dvpw63
1OHWXqmlHjf0eICXr6p8S3ttSWc09bilHrfU44Ye7ygVX+r4luQtuP3Ukzf0
edLba0vyhl5PevKXut1St1vqdhsvb6nfbby8pY630V96yg09n/T13JL63tD3
SU/Ngc5Peppz85XdfHmc+aliyi71v63XuU33AFuv/NJ9wNYrv/Xyl+4Hlu4H
tl75rfzc83m1VtwT0hZ9bbHXlvro9/q6R9SE+f9tz/9//8c//e7X/gDoX3/+
4+yl+v3zH7u3iy+N2o/Pf/4Tr8/9kp9+nl2v6vH3Ln7jLP3jN6yvW9Sffl7D
xTrGFKm+CJgfKQui/fSznzi5Zl+z9eor9/XpAq2GtqrD7yxjrfppA1ojWi/Q
+tbatfwmIw5EfiP+tAZtbK1fXj7LFwJuyJ22CnFSMd65brGuq7TSHGdmmeUz
oDmEudbK5WNu0xl7jP7pGKHf527Nx998AVGkFasfx02ISkSnRIhGxQ6xUjF2
24joVQOxUzGOOYi4QptMi1HudJpdbfml6nf0IeXjeLlFLVTEQFSYqJg7qkw0
nGc1JrZ4ZyWitzbERkVMO+1UjN0OKhrEHVCXq4zpiyLZM72Hdgfk8+ca9xXg
ryrjU5GPFaI1WDVhGvZpdzri89Uvmul8utYYnxnaHY40uZpfka26k+bzwzAK
u8OR0a46mq+ftAy/6nwHEBsTK86z3eH43Pfx+y3S12y1ysdhBuJgYkx1u8NR
HVedqwwnVF/GfTpmli2mYX7UOxxfwF+zqvNXXVqHX1SwU4WJiujqHY8v06++
VnO8KT53PwP9UY1pcFrvdHwpfpXp0DV9LFU/mBu1MSkO16kW4qAicquTijiR
dVER56oVKmKMTaiIQTZlIkbZjGoYSaPpfHsnzUcwzEYDEgyz0YCiQBoNSDB5
Gg0o2qXTgAQZdBoQzHaaj2KYnQYUndVpQIqRdBqQYiSdBqQxEhqQxkhoQIqL
pNOAEPug+UTBDpqPIYNBA4qCGTQgi2PSgKJ9Bg3IENCgARkCGjQgw7UwaECG
9AYPCOlNmhDMThpQRXqTBlSR3qQBBTFNGlBFepMGFHeSSQOqSG/SgCrSmzSg
ivQmDagivcUDQnqLJgQ/iwYU971FA2pIb9GA4q64aECBeIsGFIi3aEANAS0a
UODoogG1HdD9IT4TF0QaUIdGAwLl3h/2M1Eg0oCAIvdXAZjYINKA+oBIA+ox
TBpQvJHmM2BWaD4DZoXmMxCQ0IBwGxehAQ3ELjSgeCPNZyJ2ofnMGAnNZypE
ms+MYdKA5p60ojSgiROmNKCJ3JUGNDFOpQFNBKQ0oNBoQAsBKQ1oYZhKA1ox
EhrQipHQgLD6EGMBIXVj8QAQxZRq8UajIsZhlYo4XdaoiEFapyLmsw0qIgGb
VMS5tEVFxFNpPJiyleYjIdKAcPOTSgOK6qIMrdEGlKJBT0IpWuOEUYpWjUPS
fABIQilacTIpRCsYSChEKzBHKEQrYEUoRSt4RChFK3hEKEUreEQoRYMuhUK0
1hgJzQc8IhSiFTwiFKIVPCIUohU8IpSiFTwilKIVyCGUohVUIZSiNfZK8wFV
CIVoBVUIhWhtkQENCMghlKIVyCGUohXgIJSiNcCBUrQGOFCKjkuBQrQGOFCI
1gAHCtHakQGFaA2qoBCtHRlQiNZADkrRiuW/UIrW4BFK0Ro8MnlASI9StMZe
aUAD6VGI1iAZCtGKpyNCIVoDcyhE60B6FKIVzx6FUrQGIFGKVjwDEUrRGvS0
eEBIj1J0NC2FaA16ohCtQU8UohX0pBSiFfSkFKIV9KSUohX0pJSidXaINCCg
lVKKVqCVFh7QgkgTCo0GBLRSStG6dnpKKVoX0qMUrYAypRStC+lRilYQm1KK
1oX0KEYrcE4pRitwTilGh1eWD27VShna8ABOKUMbHvkqZWgDCCplaAMIKmVo
AwgqZWgDzimFaAOxKYVoC7M0HQmzNB8JszQgrMaVMrSB2JRStGG9pBSjDWsF
pRgNfFJK0QbWU0rRBmpVStEGnlNK0QZEUkrRppEBDQg3VaUUbbhnKMVoQyUq
xWiLK55iNJ6UKaVoi3lAKdq+vZMGBDJVitFmMRIakIVZGhCSpRht4EulGG1A
SKUYbTV2S/MBQirFaANCKsVow1MrpRgd+VCMNiCkUoy2FsOk+YASlWK04fGS
Uow2UKJSjDZQolKMNlCiUoy2OCTNByCoFKOtx0hoQD12SwPqYZYGBChTitEx
LSlFG6BMKUUboEwpRRu4SylFG9BKKUbbCEM0IIySUrTNMEvzCcyhFG1BMpSi
LXiEUrQFj1CKtuARStFxwVOItkAOCtEWVEEp2oINKEVbsAGlaAs2oBSNeChD
Y/GrFKErHrsoRegoLkrQWBcrBejoNMrP0VoUn2t4YclEoVF6jj6j8IzlslF2
bpBYLigzo+CM5bBRbsaC1yg2o+aMUjOKzCg0txgCiyV2yVLpYZOlgjWpUWDG
wtIoL6PCjOJyvI2lgo4yyspoIaOoPMImSwXLO6OgjDWaUVKOEbBU0E5GORkL
NKOYjOIySslYRxmFZDSTUUZGMRlF5JBYKitsslRW2GSpoLGM4jGWQUbpGGVm
FI7RZUbZGN+IsEDjfq3Sy5JRXf7MON6OZcm1Ru9tlFlX961xwJ3Lsmv5bXW0
1mQOv4niBAKNn+IKcUez2uXnsve+erX7mzghTioi1I3GXgluVqTN3uoaH9xh
bZPxS0PgG4z9sr98HbVKqXp/NQyLb9tc7Neov6+2Vme3+ws8celuLn6LiHWD
8UuMc7zB+C0i2A3Gfg3v78yI84jfOO+hQxxMxC3ENhi/RZySDcZ+jV9VfIxO
/EWqowKOucnYr5Bryv21snX/9cFt2zYYP7U4W5uLfTpfpfnm1h137g9ocMTN
xS8xLoHNxbc41/01nOZJ+PTBqt02F/skuqSUuaT6/Fr3Z1cQ+zfRm8cXIK32
+ws+ccxBRYS3wfgWe68+2dfq0u/nuBAXFbHbDcY+cS+34pOiD9/v/WAL4g6o
2uX3ldVsueNbxG43GD9FxareNhh7TtfwK6/7gtv8hMk3sVIRCW0yfovhdifk
l3RrWkafdY7pIaCXNhn7eb968ZH6Vm+OezUGcVIR8W0yfon4ZNo2Gb9F7Haj
sfPRVd3KKrN52vdnExCViXGNbTSupVy+0dd2zde4t4ihbDR+iXE322j8FmO3
nYqYYJuNa/fLqLXlc7GNe/Zh6W6bjd9ivHNt0Yfi58ZHMqXJ/dxri5uN3yIM
bTZ+izjmZuOXiEWtbTauc16zVpXqXb9bKMSdkFetLyOcWmWsbvfBIN4JtWJ+
lbX7W3TSmt1P5iH2LS6/kLw0yhpN7094cD43HDcvmvv7ex6Bt/G4n/dDnFv0
Gd+keqZl9n5PWIiLiXhIahuP3yLeufn4FoffkX161W7zfs4LUamIebsB+Skq
nrrZJuRW5fJiUr/h3b9wPxmE2KgYu+1UxNTckPwS8WzINiW/RcyEjclvcSdU
Nye/xQpRqNggKhUHRKPigkgTwiOeWmhCFm6RUL+K3+DEK8qb6J5ZX7/8eP14
/Xj9eP14/Xj96q9/AGe3mC8AUAAA
====