Day 1: Basic intro, pattern-action, BEGIN / END blocks, Unix OS interaction
---------------------------------------------------------------------------

AWKs currently installed on SDF: native awk(1), mawk(1), gawk(1)

NetBSD's pkgsrc also has original-awk (OTA) and misc. heirloom AWKs
 => can request on bboard>>REQUESTS

AWK dates back to the early days of Unix development at Bell labs.

developed initially as an extended grep(1) for report generation, etc.

AWK is interpreted (no compiling) with no type declarations

A.R. calls AWK "data-driven"; describe data (via regex) then take action
 => pattern { action }

default treats each line as a "record" w/ fields separated by "whitespace"

AWK + POSIX toolkit: AWK not really stand-alone, general-purpose language
 => Gnu AWK attempts being general-purpose via C-based extensions

The 3 AWK code blocks: BEGIN{} , body + usr-def. funcs., END{}

order mostly doesn't matter; multiple BEGIN/END processed in order
  ex. $ awk -f begin.awk -f body.awk -f end.awk -f funcs.awk

AWK works best with semi-uniform line-based text; CSV supported

data can be read via stdin (piped||typed) or files
#  ex. $ echo 'hello' |awk '//' ; awk '//' world.txt

output can be written to stdout, files or other POSIX tools
#   ex.
#    $ echo 'AWKward..' |awk -vMoo='cowsay' '{print |Moo}END{close(Moo)}'
# 	 ___________
# 	< AWKward.. >
# 	 -----------
# 		\   ^__^
# 		 \  (oo)\_______
# 		    (__)\       )\/\
# 			||----w |
# 			||     ||
# 

other ways to run AWK (use chmod(1) to make scripts executable)
#   ex. executable awk script: 
#    #! /usr/bin/awk -f
#    BEGIN { print "hello, world" }
# 
#   ex. shell wrapper script:
#    #! /bin/sh -
#    AWKCODE='BEGIN { print "hello, world" }'
#    awk "$AWKCODE"
# 

BEGIN mostly for var setup; FILENAME and vars set as args *NOT* accessible..

..HOWEVER, cmd line args *ARE* accessible via the ARGV[] array!
#   ex.
#    $ awk 'BEGIN{print "say =", say; \
#       for(i in ARGV)printf "ARGV[%d] = %s\n", i, ARGV[i]}' cow say=Moo 42
#       say = 
#       ARGV[2] = say=Moo
#       ARGV[3] = 42
#       ARGV[0] = awk
#       ARGV[1] = cow
# 

END mostly for wrap-up, process/print collected data from body
#   ex.
#    $ seq 0 9 | awk '{Sum = Sum + $1} END{print "Sum =", Sum}'
#    Sum = 45
# 

body is a "gauntlet"; matching continues to end of block unless interupted
#  ex. 'pat_1 { action_1 } ; pat_2 { action_2 } ;...

a pattern with no action simply prints match to stdout
#  ex. '//' or 'NR' => prints all lines

an action with no pattern applies to all data
#  ex. '{ print $1, $NF }' => prints first & last field of each line

--
Some simple code to try out:

# ex.1) find all the SDF DBA members from the /etc/group file:
# 
#     # uids obfuscated for privacy
#     $ awk -F':' '/^dba/{print $4}' /etc/group
#     blw****,bri****,cam****,can****,car****,...
#     cop****,crea****,cro****,cry****,cyb****,...
#     ...
# 

 explainer:
 '-F' is used to redefine FS, the field separator to ":" since files
 like /etc/passwd and /etc/group are structured that way. /etc/group
 has four (4) such fields; the first field is the group name, the
 last fields is a comma-separated list of group members.  We ignore
 the 2nd and 3rd fields (pwd & GID) since we just want DBA members.


# ex.2) generate sorted list of logged in users begining w/ s, d, or f:
# 
#     $ users |sort |tr ' ' '\n' | awk '/^[sdf]/'  # uids obfuscated
#     d*****b
#     f********z
#     f***d
#     f******l
#     s****a
#     s******n
#     s*****i
# 

 explainer:
 While sorting routines can be done in AWK, sort(1) is POSIX and does
 it much better and faster.  Since users(1) spits all users out on one
 line and sort maintains that structure we use tr(1) to turn data into
 newline-separated records for awk(1).  /^[sdf]/ is a regex matching
 strings beginning with the letter of interest.  Since no { action }
 is paired with the pattern awk(1) just prints match to stdout.


#  ex.3) emmulate the head(1) command in awk(1): 
# 
#     $ head -n5 /usr/share/misc/acronyms
#     $NetBSD: acronyms,v 1.287.2.3 2020/06/21 10:28:20 martin Exp $
#     10Q	thank you
#     10X	thanks
#     1337	elite ("leet")
#     224	today, tomorrow, forever
# 
#     $ awk -vn=5 'NR <= n' /usr/share/misc/acronym # 'n' passed via '-v'
#     $NetBSD: acronyms,v 1.287.2.3 2020/06/21 10:28:20 martin Exp $
#     10Q	thank you
#     10X	thanks
#     1337	elite ("leet")
#     224	today, tomorrow, forever
# 
#     $ awk 'NR <= n' n=5 /usr/share/misc/acronyms  # 'n' passed as arg
#     $NetBSD: acronyms,v 1.287.2.3 2020/06/21 10:28:20 martin Exp $
#     10Q	thank you
#     10X	thanks
#     1337	elite ("leet")
#     224	today, tomorrow, forever
# 

 explainer:
 head(1) output provided for reference. Since awk(1) examples are
 just simple pattern w/o action in body block the 'n' variable can
 be passed either via '-v' or as just another arg along with the
 data file.


#  ex.4) explore effect of redefining FS (field sepearator):
# 
#     $ echo " welcome to AWK " | \
#       awk '{for(i=1; i<=NF; i++)printf "F%d = %s\n", i, $i}'
#     F1 = welcome
#     F2 = to
#     F3 = AWK
# 
#     $ echo " welcome to AWK " | \
#       awk -F'[ ]' '{for(i=1; i<=NF; i++)printf "F%d = %s\n", i, $i}'
#     F1 =
#     F2 = welcome
#     F3 = to
#     F4 = AWK
#     F5 =
# 

 explainer:
 The default AWK FS value is "[ \t\n]+" which is a regex that means
 "one or more spaces, tabs or newlines", however by default AWK will
 ignore leading and/or trailing whitespace.  Once FS is changed from
 default if whitespace are still part of the pattern the leading or
 trailing whitespace gets counted.  The 2nd instance  above redefines
 FS to a single space (FS="" would also work), hence there are now
 5 fields found in the data record instead of 3.