Day 5: arrays and user-defined functions ... AWK Arrays ... like variables, there is no declaring of arrays in AWK arrays are 1-dimensional and associative; may be sparce due to AWK's single namespace, arrays are always global array indices are treated as strings even if purely numeric not neccessary to assign values, arrays can just be sets as with variables, an uassigned array value is "" (null) ENVIRON and ARGV are the two (2) standard built-in arrays ex. for(v in ENVIRON) print ENVIRON[v] => prints env. vars ex. for(i=1; i prints CLI args in AWK you can't have a variable and array with same name arrays &/or array elements deletable via the 'delete' cmd # ex. delete Arr[42] => deletes Arr[42] but not Arr # delete Arr => deletes Arr # # note: deleted Array names NOT reusable as a Variables # bulk assignment can be accomplished via the split() cmd # ex. split("AWK is different", arr) # splits on FS (default) # => arr[1] = "AWK", arr[2] = "is", ... # # split("hello, world", arr, "[, ]+") # splits on regex # => arr[1] = "hello", arr[2] = "world" # # => numerical indices via split() start at 1 # bulk assignment w/ non-numeric indices indirectly possible # ex. populate a[] w/ x y z as indices, 1 2 3 as values: # # N = split ("x y z", tmp) # for (i=1 ; i<=N ; i++) { a[tmp[i]] = i } # # => a[x] = 1 ,a[y] = 2, a[z] = 3 # AWK can simulate multi-dimensional arrays via subscripts => still key-value pairs, ie. Arr[1,2] == Arr[1 SUBSEP 2] # ex. arr[1, 2, 3] = "awk is different" # => index = 1, 2, 3 = 1 SUBSEP 2 SUBSEP 3 = 123 # => length(arr) = 1 # # for (i in arr) { N = split (i, ind, "") } # length(ind) = 5 # ind[1] = 1 # ind[2] = SUBSEP # ind[3] = 2 # ind[4] = SUBSEP # ind[5] = 3 # # print arr[1 SUBSEP 2 SUBSEP 3] => awk is different # print arr[1, 2, 3] => awk is different # print arr[123] => awk is different # some array fun: Array[] = # -- # #! /usr/bin/awk -f # # array_test.awk # # # BEGIN { # Pfpn = "^[-]?[[:digit:]]+[.][[:digit:]]+$" ; Pfmt[Pfpn] = " %8.2f = fpn\n" # Pint = "^[-]?[[:digit:]]+$" ; Pfmt[Pint] = " %8d = int\n" # Pstr = ".*[^-.[:digit:]].*" ; Pfmt[Pstr] = " %8s = str\n" # } # { for (i in Pfmt) if ($1 ~ i) printf Pfmt[i], $1 } # -- # # # input data: # $ tr '\n' ' ' < test.data # 3.14 awk -57 4evar 42 -3.14 *$@&#! # # $ ./array_test.awk test.data # 3.14 = fpn # awk = str # -57 = int # 4evar = str # 42 = int # -3.14 = fpn # *$@&#! = str # ... user-defined functions ... same rules governing variable and array names apply to function names basic form: function fname() {..} note: no space allowed between fname and "()" parameter list contains BOTH function args and any local variables => variations in # of args may turn local vars into function args! B. Kernighan lists this syntax decision as his biggest AWK regret it is customary to add several spaces between args and local vars prefixing local vars with underscore(s) also helps differentiate # ex. function fubar(fu, bar, _i, _j, _k) {...} # # alt. # function fubar(fu, bar, # _i, _j, _k) {...} # local variables can have same name as global variables => shaddowed called args => 'call by value' / called arrays => 'call by reference' => changes made to an array parameter effect the called array! 'return (expr)' statement should always be part of user-def. functions including '(expr)' optional but what is returned is then undefined ex. function oddp(n){return(n%2 != 0 ?1:0)} # returns '1' (T) if 'n' odd user-defined functions can call other functions as well as themselves # ex. recursion example - mimic the rev(1) command: # # rev.awk - from Effect. AWK Prog., p169 # function rev(str, start) { # if(start == 0) # return "" # return substr(str, start, 1) rev(str, start-1) # } # # $ echo 'taktik' |nawk '{print rev($0,length($0))}' -f rev.awk # kitkat # # where, # - args: $0 is passed to 'str' ; length($0) is passed to 'start' # - if start == 0 return "" => recursion complete / blank line # - otherwise return substr(str, start, 1) + re-call rev() with # start decremented ; recursion ends when start == 0 # must use 'getline' with care within user-defined functions (next time)