= Field Separator = The [[special variable]] FS is a field separator that is used to determine how [[awk]] will split each [[record]] into [[field]]s. == The single space field separator == The default value for the field separator is a single space character. In data files, fields are often separated by multiple whitespace characters, rather than a single space. The awk interpreter allows a single space separator as matching any stretch of whitespace. This prevents two consectutive spaces from being interpreted as separators delimiting an empty field. Note that only a single space is treated as a special case in this manner. Any other single characters would be treated as delimiters for an empty field: {{{ #!/bin/sh # This gives only three fields, because double spaces do not delimit an empty field echo "Rhubarb Custard Bananas" | awk 'BEGIN { FS = " " } { for (l=1;l<=NF;l++) print l ":" $l }' # This gives five fields, because groups of other delimiters are treated as separating empty fields echo "Rhubarb##Custard##Bananas" | awk 'BEGIN { FS = "#" } { for (l=1;l<=NF;l++) print l ":" $l }' }}} == Matching a single space delimiter == If you really want to use a single space delimiter, allowing groups of spaces to be treated as delimiting an empty field, then it is possible to use a regular expression containing a space in a box enclosures to represent a single space: {{{ # This gives five fields echo "Rhubarb Custard Bananas" | awk 'BEGIN { FS = "[ ]" } { for (l=1;l<=NF;l++) print l ":" $l }' }}} == Leading and trailing whitespace == When the default single space field separator is being used, the awk interpreter strips leading and and trailing whitespace is stripped from the record, before it is split into fields. However, with other field separators the whitespace is not stripped and is treated as part of the record: {{{ #!/bin/sh echo ' Rhubarb Custard Bananas '| awk 'BEGIN { FS = " " } { print $2 }' # This gives Custard echo ' Rhubarb Custard Bananas '| awk 'BEGIN { FS = "[ ]" } { print $2 }' # This gives Rhubarb }}} Note that the second awk command gives a different result, because the leading whitespace is not stripped and acts as a delimiter from the previous empty record. == The field separator can be changed by assignment == The field separator can be changed by [[assign]]ment like any other [[variable]]. Note that the field separator can be a single character or a regular expression: {{{ # Change the field separator to a colon FS = ":" }}} === _Using a regular expression as a field separator_ In the following example, we use a regular expression '''l.[^l]''', which represents a double letter l not followed by a third l: {{{ #!/bin/sh echo Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch| awk 'BEGIN { FS = "l.[^l]" } { for (l=1;l<=NF;l++) print l ":" $l }' }}} == The default field separator can be changed by a command line switch == The -F command line switch enables the default field separator to be changed. This variable can be set on the command line. === Example === The /etc/passwd has its data [[field]]s separated by colons. Here we change the field separator to a colon, in order to extract [[field]] information from the /etc/passwd file: {{{ >awk -F: '{print $1,$3 }' /etc/passwd }}} === Special behaviour === ==== The empty string value ==== The behaviour of the awk interpreter when the field separator is set to an [[empty string]] is undefined. Some implementation of awk will treat the record as having only one field, others will put each character of the record in a separate field see [[AwkFeatureComparison]] ==== The single letter t ==== ** Which awk has this beahviour? all the awk I have access to seem to treat t as just a normal letter ** {{{ If the -F command line switch is given a single letter t as a parameter, the awk interpreter assumes that the fields should be separated by tab characters, rather than by a letter t. This behaviour is implemented to avoid complications with shell quoting where backslash escape characters can become lost. If the letter t should really be used as a field separator, then it is necessary to use a regular expression containing the letter t in a box enclosures to represent the letter t: }}} {{{ >awk -F\[t\] }}}
Summary:
This change is a minor edit.
Username: