You cannot easily parse xml with awk. But there are several tricks to scrap an xml file: <toc> === Extract the content of <tag> </tag> ==== <tag> </tag> are on the same line You can use a Field separator matching the tag: </?tag>, the line will then look like {{{ field1 FS field2 FS field3 }}} where the first FS is the opening tag and the second one is the closing tag, extracting field2 is then easy: {{{ sh awk -F'</?tag>' 'NF>1{print $2}' }}} This can be generalized if you have more than one pair of <tag> on the same line: {{{ sh awk -F'</?tag>' '{for(i=2;i<=NF;i++) print $i}' }}} ==== <tag> </tag> different lines * Simplest answer: {{{ sh awk '/<tag>/,/<\/tag>/' }}} * often people don't want to see the including tags so: {{{ sh awk ' /<\/tag>/{f=0} f{print} /<tag>/{f=1}' }}} * The above solutions only work if there is nothing on the line after tag. If this is not the case, you can do something like: {{{ sh awk '/<tag>/{sub(/.*<tag>/,"");f=1}/<\/tag>/{f=0;sub(/<\/tag>/.*/,"");print}f{print}' }}} === Extracting the value of the attribute foo ==== if you want all the foo disregarding the tag * one possible solution use " as the record separator; if the record you want is the one following the record containing the attribute name: {{{ sh awk -v RS='"' '/foo=$/{getline;print}' }}} * another possibility, use the attribute name as the FS, then you are in the same kind of situation as the above trick to extract the content of the tag: {{{ something FS value" something else FS value" something else }}} except that you need to get rid of the thing after the quote. {{{ sh awk -F'foo="' '{for (i=2;i<=NF;i+=2){ sub(/".*/,"");print $i}' }}} ==== all the attribute foo of a defined tag Same trick as above, but here we use > as a record separator so that we have one tag per record. {{{ sh awk -v RS=\> -F '<tag.*foo="' 'NF>1{sub(/".*/,"",$2);print $2}' }}}
Summary:
This change is a minor edit.
Username: