FindAllMatches

Difference between revision 5 and current revision

Summary: Added anchored expressions to TODO list

No diff available.

This bit of code will match all non-overlapping instances of a given pattern in a string and will store the matches in the given array. It returns the number of patterns found.

This duplicates some of the functionality provided by the gawk extension to match() that stores the matched text in the provided array.

You can use the function as is, and just get the list of matches in the arr[] array, or you can also store the position of the matches in the string in the start[] array; if you want to do so, you need to uncomment the relevant code and modify the argument list to add the start[] array

function FindAllMatches(str, re, arr,   j, a, b) {
	
        j=0
        # eaten = 0     # optional: used if start[] is needed
        a = RSTART; b = RLENGTH   # to avoid unexpected side effects

	while (match(str, re) > 0) {
		arr[++j] = substr(str, RSTART, RLENGTH)

                # start[j]=RSTART+eaten    # optional: save position of match in the string
                # eaten+=(RSTART+RLENGTH-1)
 
		str = substr(str, RSTART+RLENGTH)
	}
	RSTART = a; RLENGTH = b
	return j
}

TODO: correctly handle 0-length matches.

TODO: handle anchored REs (eg, if RE is "^foo" and str is "foofoo", return only one "foo") - HARD