Sometimes it is useful to compare two files. To do this in awk, the trick is to load the data from the first file into an array.
Let's say for instance we have a list of first names in file1, one per line:
John
Mary
and in file2 with complete names:
John Smith
Mark Fo
Mary Bar
We want to find the names in file2 corresponding to the first name in file1. This can be done in a compact manner like this:
awk 'FNR==NR {arr[$0];next} $1 in arr' file1 file2
Some explanations:
- FNR == NR: This test is true when the number of records is equal to the number of records in the file. This is only true for the first file, for the second file NR will be equal to the number of lines of file1 + FNR.
- arr[$0]: This is a classic technique to create an array element index by the whole line. This will create an array with the first names of file1.
- next: this will skip to the next record so no more processing is done on file1.
- $1 in arr: This will only happen on the records of file2 because of the next. If $1 is present in arr, ie in file1, the default action will be executed and the line will be printed.
Note: For this example, join(1) is a working alternative.