ComparingTwoFiles

Last edit

Changed:

< Sometimes it is useful to compare 2 files, to do this in awk, the trick is to first load the data from the first file into an array.
< Let's say for instance that we have a list of first names in file1, one per line:

to

> Sometimes it is useful to compare two files. To do this in awk, the trick is to load the data from the first file into an array.
> Let's say for instance we have a list of first names in file1, one per line:

Changed:

< and a file2 with complete names:

to

> and in file2 with complete names:

Changed:

< We want to find the names in file 2 corresponding to the first name in file1, this can be done in a compact manner like this:

to

> We want to find the names in file2 corresponding to the first name in file1. This can be done in a compact manner like this:

Changed:

< * //FNR == NR//: this test is true when the number of record is equal to the number of records in the file, this is only true for the first file, for the second file NR will be equal to the number of lines of file1 + FNR
< * //arr[$0]//: this is a classic technique to create an array element index by the whole line, this will create an array with the first names of file1
< * //next//: this will skip to the next record so that no more processing is done on file1
< * //$1 in arr//: this will only happen on the records of file2 because of the next, this test if $1 is present in arr, ie in file1, if true the default action will be executed and the line will be printed.

to

> * //FNR == NR//: This test is true when the number of records is equal to the number of records in the file. This is only true for the first file, for the second file NR will be equal to the number of lines of file1 + FNR.
> * //arr[$0]//: This is a classic technique to create an array element index by the whole line. This will create an array with the first names of file1.
> * //next//: this will skip to the next record so no more processing is done on file1.
> * //$1 in arr//: This will only happen on the records of file2 because of the next. If $1 is present in arr, ie in file1, the default action will be executed and the line will be printed.


Sometimes it is useful to compare two files. To do this in awk, the trick is to load the data from the first file into an array.

Let's say for instance we have a list of first names in file1, one per line:

John
Mary

and in file2 with complete names:

John Smith
Mark Fo
Mary Bar

We want to find the names in file2 corresponding to the first name in file1. This can be done in a compact manner like this:

awk 'FNR==NR {arr[$0];next} $1 in arr' file1 file2

Some explanations:

Note: For this example, join(1) is a working alternative.