Day 1: Basic intro, pattern-action, BEGIN / END blocks, Unix OS interaction --------------------------------------------------------------------------- AWKs currently installed on SDF: native awk(1), mawk(1), gawk(1) NetBSD's pkgsrc also has original-awk (OTA) and misc. heirloom AWKs => can request on bboard>>REQUESTS AWK dates back to the early days of Unix development at Bell labs. developed initially as an extended grep(1) for report generation, etc. AWK is interpreted (no compiling) with no type declarations A.R. calls AWK "data-driven"; describe data (via regex) then take action => pattern { action } default treats each line as a "record" w/ fields separated by "whitespace" AWK + POSIX toolkit: AWK not really stand-alone, general-purpose language => Gnu AWK attempts being general-purpose via C-based extensions The 3 AWK code blocks: BEGIN{} , body + usr-def. funcs., END{} order mostly doesn't matter; multiple BEGIN/END processed in order ex. $ awk -f begin.awk -f body.awk -f end.awk -f funcs.awk AWK works best with semi-uniform line-based text; CSV supported data can be read via stdin (piped||typed) or files # ex. $ echo 'hello' |awk '//' ; awk '//' world.txt output can be written to stdout, files or other POSIX tools # ex. # $ echo 'AWKward..' |awk -vMoo='cowsay' '{print |Moo}END{close(Moo)}' # ___________ # < AWKward.. > # ----------- # \ ^__^ # \ (oo)\_______ # (__)\ )\/\ # ||----w | # || || # other ways to run AWK (use chmod(1) to make scripts executable) # ex. executable awk script: # #! /usr/bin/awk -f # BEGIN { print "hello, world" } # # ex. shell wrapper script: # #! /bin/sh - # AWKCODE='BEGIN { print "hello, world" }' # awk "$AWKCODE" # BEGIN mostly for var setup; FILENAME and vars set as args *NOT* accessible.. ..HOWEVER, cmd line args *ARE* accessible via the ARGV[] array! # ex. # $ awk 'BEGIN{print "say =", say; \ # for(i in ARGV)printf "ARGV[%d] = %s\n", i, ARGV[i]}' cow say=Moo 42 # say = # ARGV[2] = say=Moo # ARGV[3] = 42 # ARGV[0] = awk # ARGV[1] = cow # END mostly for wrap-up, process/print collected data from body # ex. # $ seq 0 9 | awk '{Sum = Sum + $1} END{print "Sum =", Sum}' # Sum = 45 # body is a "gauntlet"; matching continues to end of block unless interupted # ex. 'pat_1 { action_1 } ; pat_2 { action_2 } ;... a pattern with no action simply prints match to stdout # ex. '//' or 'NR' => prints all lines an action with no pattern applies to all data # ex. '{ print $1, $NF }' => prints first & last field of each line -- Some simple code to try out: # ex.1) find all the SDF DBA members from the /etc/group file: # # # uids obfuscated for privacy # $ awk -F':' '/^dba/{print $4}' /etc/group # blw****,bri****,cam****,can****,car****,... # cop****,crea****,cro****,cry****,cyb****,... # ... # explainer: '-F' is used to redefine FS, the field separator to ":" since files like /etc/passwd and /etc/group are structured that way. /etc/group has four (4) such fields; the first field is the group name, the last fields is a comma-separated list of group members. We ignore the 2nd and 3rd fields (pwd & GID) since we just want DBA members. # ex.2) generate sorted list of logged in users begining w/ s, d, or f: # # $ users |sort |tr ' ' '\n' | awk '/^[sdf]/' # uids obfuscated # d*****b # f********z # f***d # f******l # s****a # s******n # s*****i # explainer: While sorting routines can be done in AWK, sort(1) is POSIX and does it much better and faster. Since users(1) spits all users out on one line and sort maintains that structure we use tr(1) to turn data into newline-separated records for awk(1). /^[sdf]/ is a regex matching strings beginning with the letter of interest. Since no { action } is paired with the pattern awk(1) just prints match to stdout. # ex.3) emmulate the head(1) command in awk(1): # # $ head -n5 /usr/share/misc/acronyms # $NetBSD: acronyms,v 1.287.2.3 2020/06/21 10:28:20 martin Exp $ # 10Q thank you # 10X thanks # 1337 elite ("leet") # 224 today, tomorrow, forever # # $ awk -vn=5 'NR <= n' /usr/share/misc/acronym # 'n' passed via '-v' # $NetBSD: acronyms,v 1.287.2.3 2020/06/21 10:28:20 martin Exp $ # 10Q thank you # 10X thanks # 1337 elite ("leet") # 224 today, tomorrow, forever # # $ awk 'NR <= n' n=5 /usr/share/misc/acronyms # 'n' passed as arg # $NetBSD: acronyms,v 1.287.2.3 2020/06/21 10:28:20 martin Exp $ # 10Q thank you # 10X thanks # 1337 elite ("leet") # 224 today, tomorrow, forever # explainer: head(1) output provided for reference. Since awk(1) examples are just simple pattern w/o action in body block the 'n' variable can be passed either via '-v' or as just another arg along with the data file. # ex.4) explore effect of redefining FS (field sepearator): # # $ echo " welcome to AWK " | \ # awk '{for(i=1; i<=NF; i++)printf "F%d = %s\n", i, $i}' # F1 = welcome # F2 = to # F3 = AWK # # $ echo " welcome to AWK " | \ # awk -F'[ ]' '{for(i=1; i<=NF; i++)printf "F%d = %s\n", i, $i}' # F1 = # F2 = welcome # F3 = to # F4 = AWK # F5 = # explainer: The default AWK FS value is "[ \t\n]+" which is a regex that means "one or more spaces, tabs or newlines", however by default AWK will ignore leading and/or trailing whitespace. Once FS is changed from default if whitespace are still part of the pattern the leading or trailing whitespace gets counted. The 2nd instance above redefines FS to a single space (FS="" would also work), hence there are now 5 fields found in the data record instead of 3.