8. WHY?
FIND LARGEST FILE IN A DIRECTORY
▸ ls -S | head -1
▸ takes about 4 seconds to type
▸ takes about 45 seconds to google
9. WHY?
FIND LARGEST FILE IN A DIRECTORY
▸ ls -S | head -1
▸ takes about 4 seconds to type
▸ takes about 45 seconds to google
▸ largest () { ls -S | head -1 }
▸ takes about 15 seconds to type...
▸ and then 2 seconds forever after
10. WHY?
FIND LARGEST FILE IN A DIRECTORY
▸ ls -S | head -1
▸ takes about 4 seconds to type
▸ takes about 45 seconds to google
▸ largest () { ls -S | head -1 }
▸ takes about 15 seconds to type...
▸ and then 2 seconds forever after
28. WHEN?
CHESS GAME WINNERS
▸ ~14 million games == ~10 gigs
▸ Read in files, aggregate winners, report stats == grep /
awk
▸ Sounds like a job for: THE COMMAND LINE!
29. find . -type f -name '*.pgn' -print0 | xargs -0 -n4 -P4 mawk '/
Result/ { split($0, a, "-"); res = substr(a[1], length(a[1]), 1); if (res ==
1) white++; if (res == 0) black++; if (res == 2) draw++ } END { print
white+black+draw, white, black, draw }' | mawk '{games += $1;
white += $2; black += $3; draw += $4; } END { print games, white,
black, draw }'
Thanks to Adam Drake for figuring this out!
http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
32. HOW?
I HAVE NO IDEA WHAT I'M DOING, BUT THIS IS WHAT I DO
1. Identify constraints
‣ Money, Business Goals, Time, Performance, Beauty
2. Pursue mitigating factors within constraints
3. Relentlessly pursue constraints until the laws of physics
tell you to stop
4. Iterate
34. CONSTRAINTS
HARDWARE
▸ AWS EBS gp2 volumes have max ~160MiB/s*
▸ AWS EBS io2 volumes have max ~320 MiB/s*
▸ Realistic max network performance of ~800 MiB/s*
▸ CPU performance varies on instance type but up to 8
terraflops per instance
35. CONSTRAINTS
BUSINESS CONSTRAINTS (MONEY)
▸ How much money can we spend on this problem?
▸ How much time can we spend on this problem?
▸ Will a partially automated solution allow us to kick this
problem down the road?
▸ Is it cheaper/easier/more accurate to hire humans?
41. CLOSING THOUGHTS
INFRASTRUCTURE AS CODE == AUTOMATION AS CODE
▸ Chef, Puppet, Ansible or something similar is vital
(you already know this)
▸ AWS CloudFormation lets you launch entire
FLEETS and NETWORKS in moments. This system
is free...
▸ Heterogeneity is the death of automation (cattle
not pets)
42. CLOSING THOUGHTS
STOP DOING THE THINGS YOU ALREADY KNOW HOW TO DO
▸ You already know how to run a reverse proxy -- write the
code once and be done with it (Amazon API Gateway).
▸ You already know how to maintain your database (AWS
RDS) set up protocols and be done with it
▸ You already know how to manage security and
permissions, don't set it up yet again
43. CLOSING THOUGHTS
▸ Identify the biggest automation wins
▸ Pursue your automation by pursuing constraints
▸ Add humanity to the process
▸ Don't overthink it, but definitely think about it