ReadDirectory

Last edit

Summary: Read in a list of files in a directory

Added:

> There is the tricky question of what to do when there are no files. How can one tell the difference between no files and a single file with the literal name "*"? That and error checking are left as an exercise for the reader.


Getting a list of files in a directory is a tricky process. One might be tempted to try use ls and getline, but as we know parsing ls is a bad idea. A file name can contain any character other than "/"(slash) and "\0"(null). Posix awk strings cannot contain null (undefined behavior) and RS can only be a single character. This leaves us with "/" as the only reasonable separator.

BEGIN {
    RS = "/"
    cmd = "cd \""dir"\" && printf '%s/' *"
    while (cmd | getline > 0) if ($0) files[n++] = $0
    close(cmd)
    for (f in files) printf("/%s/\n", files[f])
}

Assuming that dir is passed in to awk

awk "-vdir=$HOME" -f readfiles

If you only want directories, use "*/" instead of "*". Every other record will be empty due to the double "/", but that's why we check that the record is nonempty before adding it to the array.

You can also add ENVIRON["PWD"] or dir when storing the files to get a fuller path.

In gawk, RS can be a regex and can contain null. This makes things a little simpler.

BEGIN {
    RS = "\0\0"
    cmd = "cd \""dir"\" && printf '%s\\0' *"
    cmd | getline
    close(cmd)
    patsplit($0, files, "[^\0]+")
    for (f in files) printf("/%s/\n", files[f])
}

Here we can read in the entire list at once (dependent on RAM as gawk can read lines longer than the LINE_MAX that POSIX requires). Note the "\\0" so that printf(1) interprets the null instead of awk (which causes shell problems because the shell can't handle null bytes well). Also note the use of patsplit with "[^\0]+" to get rid of the empty record we get at the end due to the trailing null.

There is the tricky question of what to do when there are no files. How can one tell the difference between no files and a single file with the literal name "*"? That and error checking are left as an exercise for the reader.