RS

Last edit

Summary: update from markhobley.yi.org

Changed:

< The [[special variable]] RS is a record separator that is used to determine how [[awk]] will divide its [[input]] into [[record]]s.

to

> The [[special variable]] RS is a record separator that is used to determine how [[awk]] divides its input into [[record]]s.

Changed:

< The default record separator is a [[newline]] character.
< == The record separator can be changed by assignment ==
< The record separator can be changed by [[assign]]ment like any other [[variable]]:
< === Changing the record separator in the middle of processing an input file ==
< If the value of the record separator
is changed in the middle of processing an input file, then the new value of the record separator will be used as a [[delimiter]] for subsequent records. Note that an record currently being processed and any previous records will not be affected by the change of record separator.
< ==
The default record separator can be changed by a command line switch ==
< It is possible to change
the record separator by using the variable assignment command line parameters. The following example shows the record separator being set to a [[colon]] at the command line:
< awk -v RS=":"

to

> The default record separator is a newline character, so by default each new line of data is treated as a new record, as in the following dataset contains 4 records:
> {{{
> Annie 3
> Bobby 2
> Charlie 4
> Dave 3
> }}}

> == The record separator can be changed by assignment or command line switch ==
> The record separator can be changed by assignment like any other variable. This is often done in a begin block, before any input is read:
> {{{
> BEGIN {
> # Change
the record separator to an exclamation mark
> RS = "!"
> }
> }}}

> The -v command line switch enables the default record separator value to be changed at invocation time. Here we change the record separator to an empty string:
> {{{
> #!/bin/sh

> awk -v'' '{ print $1,$3 }' /home/accounts/transdata.fil
> }}}

Changed:

< == Using an empty string as the record separator ==
< If the record separator is set to an [[empty string]], this means that records are separated by one or more blank lines.

to

> == Changing the record separator in the middle of processing an input file ==
> If the value of the record separator is changed in the middle of processing an input file, then the new value of the record separator will be used
as a delimiter for subsequent records. Note that an record currently being processed and any previous records will not be affected by the change of record separator.
> == Multiline Records ==
> The [[awk]] interpreter supports the use of multiline records, by setting
the record separator is set to an emptystring. When multiline [[record]]s are being used, each line is treated as a field of data, and each [[record]] is separated by one or more blank lines. An empty line will be interpreted as the end of the [[record]] and multiple blank lines will be treated as a single record separator. Following a blank line, the next record will not begin until a nonempty line follows. The following example dataset contains two multiline [[record]]s:
> {{{
> Annette Baxby
> 23 Luthton Road
> London
> Bobby Lewis
> 48 Dockside Row
> Merseyside
> }}}
> Note that blank lines must be completely empty to be considered a record separator. Lines containing whitespace will be treated as part of a record and the end of the file will always be treated as the end of the record. If the last record is not followed by a blank line, the final newline will be discarded.
> The newline character will always act as a field separator when multiline records are being. There is no way to prevent this behaviour, but it is possible to use the split function to extract fields as desired.

>


Record Separator

The special variable RS is a record separator that is used to determine how awk divides its input into records.

The default record separator is a newline character

The default record separator is a newline character, so by default each new line of data is treated as a new record, as in the following dataset contains 4 records:

Annie 3
Bobby 2
Charlie 4
Dave 3

The record separator can be changed by assignment or command line switch

The record separator can be changed by assignment like any other variable. This is often done in a begin block, before any input is read:

BEGIN {
  # Change the record separator to an exclamation mark
  RS = "!"
}

The -v command line switch enables the default record separator value to be changed at invocation time. Here we change the record separator to an empty string:

#!/bin/sh
awk -v'' '{ print $1,$3 }' /home/accounts/transdata.fil

Remember that shell interpolation needs to be considered when passing command line parameters this way.

Changing the record separator in the middle of processing an input file

If the value of the record separator is changed in the middle of processing an input file, then the new value of the record separator will be used as a delimiter for subsequent records. Note that an record currently being processed and any previous records will not be affected by the change of record separator.

Multiline Records

The awk interpreter supports the use of multiline records, by setting the record separator is set to an emptystring. When multiline records are being used, each line is treated as a field of data, and each record is separated by one or more blank lines. An empty line will be interpreted as the end of the record and multiple blank lines will be treated as a single record separator. Following a blank line, the next record will not begin until a nonempty line follows. The following example dataset contains two multiline records:

Annette Baxby
23 Luthton Road
London

Bobby Lewis
48 Dockside Row
Merseyside

Note that blank lines must be completely empty to be considered a record separator. Lines containing whitespace will be treated as part of a record and the end of the file will always be treated as the end of the record. If the last record is not followed by a blank line, the final newline will be discarded.

The newline character will always act as a field separator when multiline records are being. There is no way to prevent this behaviour, but it is possible to use the split function to extract fields as desired.

Setting the record separator as a nul character

Note that in some implementations of awk, it may not be possible to set the record separator to a literal [[nul?]] character, because [[nul?]] is treated as a string terminator in the underlying C library. The causes the record separator to be interpreted as an empty string.