3. awk
a generic text processor where
“A file is treated as a sequence of records, and by
default each line is a record.” - Alfred V. Aho
developed in 1977 by Alfred Aho, Peter Weinberger, and Brian
Kernighan @ Bell Labs
uses AWK as programming language
ak'EI {pit"el Wrd"}
w BGN
rn Hlo ol! '
procedural
interpreted
a program is a series of pattern action pairs
4. Why another awk?
“Whenever faced with a problem, some people say
`Lets use AWK.' Now, they have two problems.” - D.
Tilbrook
avoid the AWK programming language
use a generic language, not a DSL
BGNslt" bcca,)frii abai]1r";o( i brr ";rn r
EI{pi(a
"a;o( n )[[]=;="fri n )=" ipit }
nb$wrs" bcca
u
od a
"
procedural (imperative) vs functional programming for stream
processing
5. Haskell-awk (Hawk)
a generic text processor where
“A stream is treated as a sequence of records, and
by default each line is a record.”
the same philosophy of awk!
developed in 2013 by me and Samuel Gélineau, the name is a tribute
to awk
uses Haskell as programming language
hw 'HloWrd"
ak "el ol!'
functional
(incrementally) compiled
a program is a Haskell expression
6. Why Haskell
expressive, clean and concise
>fle od[,,,]
itr d 1234
[,]
13
functions as composable building blocks
>ltwrCut=sm.mp(egh.wrs .lns
e odon
u
a lnt
od)
ie
>:yewrCut
tp odon
wrCut: Srn - It
odon : tig > n
>wrCut" 23n 56n 89
odon 1
4
7
"
9
partial application
>:yemp
tp a
mp: ( - b - []- []
a : a > ) > a > b
>:yent
tp o
nt: Bo - Bo
o : ol > ol
>:yempnt
tp a o
mpnt: [ol - [ol
a o : Bo] > Bo]
>mpnt[reFle
a o Tu,as]
[as,re
FleTu]
point-free style, laziness ...
8. Modes
evaluate an expression
$hw ''
ak 1
1
$hw '12'
ak [,]
1
2
$hw '[,][,]'
ak [12,34]'
12
34
apply an expression to the input
$eh '2n'|hw - '.ees'
co 1n3
ak a Lrvre
3
2
1
map an expression to each record of the input
$eh ' 2n 4 |hw - '.ees'
co 1 3 '
ak m Lrvre
21
43
9. IO format
The input is, by default, a list of list of strings where lines are
separated by n and words by spaces
$eh ' 2n 4 |hw - 'hw
co 1 3 '
ak a so'
["""",""""]
[1,2][3,4]
Options -d/-D are provided to change delimiters or set them to
empty
$eh ',;,'|hw - -''-'''hw
co 1234
ak a d, D; so'
["""",""""]
[1,2][3,4]
$eh ' 2n 4 |hw - -' 'hw
co 1 3 '
ak a d' so'
[12,34]
" "" "
$eh ' 2n 4 |hw - -' -' 'hw
co 1 3 '
ak a d' D' so'
" 2n 4n
1 3 "
The output can be any type that instantiate the typeclass Rows
cas(hwa = Rw awee
ls So ) > os
hr
rp : BtSrn - a- [yetig
er : yetig >
> BtSrn]
10. Examples
get all users of a UNIX system
$ct/t/asd|hw -:- '.ed
a ecpsw
ak d m Lha'
ro
ot
deo
amn
..
.
select username and userid
$ct/t/asd|hw -:-'t - 'l- ( ! 0l! 2'
a ecpsw
ak d o' m > l ! , ! )
ro
ot
0
deo 1
amn
..
.
sort by username (instead of pid)
$ct/t/asd|hw -:- '.oty(opr `n Lha)
a ecpsw
ak d a LsrB cmae o` .ed'
bnx22bn/i:bns
i::::i:bn/i/h
deo::::amn/s/bn/i/h
amnx11deo:ursi:bns
..
.
get the number of users using each shell
>ct/t/asd|hw -d '.a (.ed&&Llnt).Lgop.Lsr .LmpLls'
a ecpsw
ak a: Lmp Lha & .egh
.ru
.ot
.a .at
/i/ah1
bnbs:
..
.
11. Context
Hawk can be customized using files inside the context directory (by
default ~/.hawk)
The most important file is prelude.hs that contains the "runtime
context"
$ct~.akpeueh
a /hw/rld.s
{#LNUG EtneDfutue,OelaeSrns#}
- AGAE xeddealRls vroddtig ipr Peue
mot rld
ipr qaiidDt.yetigLz.hr a B
mot ulfe aaBtSrn.ayCa8 s
ipr qaiidDt.ita L
mot ulfe aaLs s
for instance, we can add a function for taking elements in an
interval
$eh 'aeewe se=Ltk ( -s .Ldo s > ~.akpeueh
co tkBten
.ae e
)
.rp ' > /hw/rld.s
$sq010|hw - 'aeewe 24
e
0
ak a tkBten
'
2
3
13. Hawk must be fast
cache the context
use the timestamp to check if the context is changed since last
run
compile it with ghc
use locks to compile only once when multiple Hawk instances
instances are running
hw '1.'|hw - '.ae3
ak [.]
ak a Ltk '
use ByteString instead of String
...
14. Parse and interpret Haskell
Hawk combines two Haskell libraries
haskell-src-exts to deal with haskell source code
>ipr Lnug.akl.xsPre
mot agaeHselEt.asr
>gtoPams"- LNUG NIpiiPeueOelaeSrns#}n
eTprga {# AGAE omlctrld,vroddtig -"
Prek[agaerga(rLc
asO LnugPam Sco
{rFlnm ="nnw.s,scie=1 scoun=1)
scieae
ukonh" rLn
, rClm
}
[dn "omlctrld"Iet"vroddtig"]
Iet NIpiiPeue,dn OelaeSrns]
hint to interpret the user expression
>ipr Lnug.akl.nepee
mot agaeHselItrrtr
>rnnepee $stmot [Dt.n" > itrrt""(s: It
uItrrtr
eIprs "aaIt] > nepe 1 a : n)
Rgt1
ih
>rnnepee $stmot [Dt.n" > itrrt"o"(s: It
uItrrtr
eIprs "aaIt] > nepe fo a : n)
Lf (otopl [hErr{rMg="o i soe `o'})
et WnCmie Gcro ers
Nt n cp: fo"]