Day 3: AWK builtins & defaults, basic IO, printf() / sprintf() AWK has several built-in variables; most can be reassigned # AWK built-in Variables and (default value): # # FILENAME = name of current data file if any # # NF = number of fields in current record # NR = number of records, total # FNR = number of records, current file # => NF, NR, FNR set with each new record read # => NF can be reassigned, i.e. # $ echo 'a b c' |awk '{NF-- ; print $0}' # a b # # FS = field separator (" " ; can be regex) # OFS = output field separator (" ") # # RS = record separator ("\n") * # ORS = output record separator ("\n") # * RegEx sometimes supported; POSIX limits to single char # # OFMT = floating-point format string ("%.6g") # CONVFMT = floating-point -> str format string ("%.6g") # # RSTART = start of pattern match in match() # RLENGTH = length of pattern match in match() # SUBSEP = "multi-dim" array indices separator ("\034") # ARGC = # of cmd line args (includes calling cmd) # AWK has two built-in arrays: ENVIRON & ARGV ENVIRON = array of inherited shell environment variables # ex. print PWD env. var: # $ awk 'BEGIN{if(PWD in ENVIRON)print "PWD =",ENVIRON["PWD"]}' # PWD = /home/sdfer/Projects/AWK # ENVIRON["var"] reassignments won't affect inherited environment ARGV = array of cmd line args ; ARGV[0] = "awk" ; ARGC = total can indirectly access vars assigned as args in BEGIN{} via ARGV => see example in Day1.log AWK has several built-in commands => see POSIX standard reference AWK command sampling.. index(s, t) => returns position of "t" in "s" OR 0 if not found match(s, e) => returns position of "e" in "s" OR 0 if not found; "e" can be string OR regex; sets RSTART & RLENGTH variables; often used with substr() substr(s, m[, n]) => returns portion of "s" from 'm' to end, or 'n' # ex. index(), match(), substr() sampling.. # $ echo 'a_b_c' |awk '!index($0,"Z"){print " => no Z"}' # no Z # $ echo 'a_b_c' |awk '{print substr($0,index($0,"b"))}' # b_c # $ echo 'a_b_c' |awk '{print substr($0,index($0,"b")-1,3)}' # _b_ # $ echo 'a_b_c' |awk '{match($0,"b");print substr($0,RSTART)}' # b_c # $ echo 'a_b_c' |awk '{match($0,"b");print substr($0,RSTART,1)}' # b # $ echo 'a_b_c' |awk '{match($0,".b.");print substr($0,RSTART,RLENGTH)}' # _b_ # $ echo 'a_b_c' |awk '{match($0,".b.");print substr($0,++RSTART,RLENGTH-2)}' # b # AWK has two distructive commands: sub() and gsub() sub(e, r[, v]) => replace "e" with "r" in $0 or 'v' ; just once gsub(e, r[, v]) => replace "e" with "r" in $0 or 'v' ; multiple times for both, "e" can be string or regex ; assumes $0 if 'v' omitted # ex. sub() and gsub() sample.. # $ echo 'a - b - c' |awk '{sub("-", "+")};//' # a + b - c # $ echo 'a - b - c' |awk '{sub("-", "+", $2)};//' # a + b - c # $ echo 'a - b - c' |awk '{sub("-", "+", $4)};//' # a - b + c # $ echo 'a - b - c' |awk '{gsub("-", "+")};//' # a + b + c # $ echo 'a - b - c' |awk '{gsub("[a-z]", "&&&")};//' # aaa - bbb - ccc # $ echo 'a - b - c' |awk '{gsub("[a-z]", $2)};//' # - - - - - # ... onto IO ... AWK special filenames: "/dev/stdin", "/dev/stdout", "/dev/stderr" => for POSIX systems; check manpage for proper referencing terminal IO: "-" == "/dev/stdin" ; "/dev/tty" == "/dev/stdout" # eg. woot all the way down.. # # $ echo 'woot' |awk '//' # woot # $ echo 'woot' |awk '//' - # woot # $ echo 'woot' |awk '//' /dev/stdin # woot # $ echo 'woot' |awk '{print}' # woot # $ echo 'woot' |awk '{print > "/dev/tty"}' # woot # $ echo 'woot' |awk '{print > "/dev/stdout"}' # woot # AWK can use '<', '>', '>>', and '|' for external files & cmds # ex. random data sorted numerically assending: # # $ shuffle -n7 |awk '{print |"sort"}END{close("sort")}' # 0 # 1 # 2 # 3 # 4 # 5 # 6 # note: the first use of 'print >"file"' zeros, subsequent use appends => safer to use "shell rules" and stick to '>>' for appending note: while AWK supports "/dev/stderr" it mostly lacks error handling => ie. no way to determine if a file is writable beforehand for finer-grained control use test(1) via system() instead: # ex. test if file-ro is writable: # $ awk '!system("test -w" FILENAME){print "not writable"}' test-ro # not writable # # note: system() only returns it's exit status # ... onto printf() / sprintf() ... AWK's printf() is modelled on printf(3) from C language basic form: printf ("format_str", arg, arg, ...) ; '()' are optional AWK sprintf() similar but used for string creation ; '()' are required both printf() and sprintf() support dynamic field sizing: # ex. print f.p. value w/ 7 sig. fig. & percision of 2, 0 padded: # $ awk 'BEGIN{printf "%0*.*f\n", 7, 2, 123.456789}' # 0123.46 # # note: zero padding only works w/ numbers # printf() quoted format strings can be assigned to variables and used => $ awk 'BEGIN{Fmt="%0*.*f\n"; printf Fmt, 7, 2, 123.456789}' sprintf() often useful for string concatenation # ex. string concatenation 2 different ways.. # $ awk 'BEGIN{S = "a" ":" "b" ; print S}' # a:b # $ awk 'BEGIN{S = sprintf("%s:%s", "a", "b") ; print S}' # a:b #