You're building your startup and you know it will be big. You don't want to spend a lot of time on infrastructure, but you also don't want to be putting out fires after you get mentioned on Hacker News. In this session, we will give you real practical tips that you can take home with you on building an infrastructure that will scale quickly with minimal up front work on your part, using time tested techniques in infrastructure as code, SaaS, and Serverless, among other things.
7. Takeaways
• Infrastructure as Code
• Microservices/Serverless
• Queueing Theory
• Chaos Engineering
• Logs
• Incident reviews
8. Infrastructure as Code
• Changes are routine, small,
easy, and repeatable
• Resources are easily
managed by users and
disposable
• Enables continuous
deployment and
improvement
• Solutions can be easily
tested, measured, and then
rolled back
9. • Losing track of servers
and resources
• Configuration drift
• Snowflakes
• Fear of a fully
automated system
(lack of trust in oneself)
Infrastructure as Code
Challenges
10. Automate all the things!
http://hyperboleandahalf.blogspot.com/2010/06/this-is-why-ill-never-be-adult.html
11. • Application startup
• Configuration
• Code deployment
• System
deployment
Automate all the things!
19. Test and prod are different
Prod is in need of constant updates
Slow iteration and deployment
Polyglot unfriendly
Deploy in weeks, live for years
Physical Servers
20. Prod is immutable
Rapid iteration and deployment
Multi-tenancy
Polyglot friendly
Deploy in minutes, live for weeks
Virtual Machines
21. Test and prod are the same
Prod is immutable
Rapid(er) iteration and deployment
High multi-tenancy
Polyglot friendly
Deploy in seconds, live for hours
Containers
22. Smallest unit of compute
Super scalable
Rapid iteration
Extreme multi-tenancy
Very polyglot friendly
Easier to collaborate
Deploy independently, live for seconds
Serverless
λ
23. A whole lot of choices
Amazon’s EcosystemHodgepodge of services
26. What is serverless anyway?
• There are still servers, you just don’t
manage them anymore
• It also means you don’t access them
anymore
• So you don’t need to (or get to)
optimize them.
27. Serverless computing is all about
speeding up development by allowing
rapid iteration and removing
management overhead
28. Choosing your unit of compute
• VMs
Machine as the unit of scale
Abstracts the hardware
• Containers
Application as the unit of scale
Abstracts the OS
• Serverless
Functions as the unit of scale
Abstracts the language runtime
EC2
ECS
Lambda
29. How do I choose?
• VMs
“I want to configure machines,
storage, networking, and my
OS”
• Containers
“I want to run servers,
configure applications, and
control scaling”
• Serverless
“Run my code when it’s
needed”
EC2
ECS
Lambda
I didn’t write
the software
myself
31. Advantages to a Monorepo
• No worrying about
dependencies
• Don’t have to account for
data movement
• Deployments are simple
• Coordination is easy
37. Advantages to a Service Oriented
Architecture
• Easier auto-scaling
• Easier capacity planning
• Identify problematic code-paths more easily
• Narrow in the effects of a change
• More efficient local caching
38. Highly aligned, loosely coupled
• Services are built by different
teams who work together to
figure out what each service
will provide.
• The service owner publishes
an API that anyone can use
and returns proper response
codes
39. Distributed Computing and a
Distributed Workforce
• The two go hand in hand
when you have a good
distributed systems culture
• Microservices and Micro
Teams
50. Continuous scalingNo servers to
manage
Never pay for idle
– No cold servers
(only happy
accountants)
Benefits of AWS Lambda
51. What does Lambda do for you?
• Scales server capacity automatically
• API to trigger execution
• Ensures function is executed in parallel
and at scale
• Logging, monitoring, etc
• Easy pricing
55. Cost Comparison
There’s about 2.5M seconds in a month, so 3M requests is about 1.2 per second
The T2.Small is $18.98 a month, more than Lambda already
56.
57.
58.
59. Lambda lets you manage
your code and infrastructure
in the same place
60. Lambda lets your developers manage
their code and your infrastructure
in the same place
61. All the problems you have with
microservices are multiplied 10X
with serverless
62. Problems with
Serverless
• efficient dependency usage
• local dev environments
• making sure everyone has the same
dependencies
• knowing when someone else is
deploying the same function
63.
64.
65.
66. Testing
• You can’t test the network, but
a good application test should
obviate the need to do so.
• Not really a solved problem.
Can do local testing.
• Can also send json to the
function and compare the
results.
67. Tips and Tricks
• Limit your function size
(JVM startup time
especially)
• Remember execution is
async
• Don’t assume function
container reuse but
take advantage of it
68. Tips and Tricks
• Remember the 500MB in /tmp
• Use function aliases
• Use the included logger
69. Tips and Tricks
• Set up alarms on all
Lambda Cloudwatch
metrics
• Avoid throttling by using
SNS between any service,
such as S3
• Beware of infinite loops by
functions calling each other.
70. Avoiding Infinte Loops
• With a distributed team, this
is an easy mistake to make
• To avoid it, pass a call stack
and check for self in the
stack
8
71. So where does that leave
us?
Serverless or containers?
Services or monorepo?
77. Self Serve is the Key
• Let developers choose what
metrics to submit
• What graphs they put on
their dashboards
• What to alert on
• They are closest to the app,
so they know best
Monitoring
and Alerting
78. Alert on increase of
failure, not lack of success
Increase in 500s Decrease in 200s
Monitoring
and Alerting
👍 👎
85. Queuing
• Queue anything you are
writing to a data store
• Monitor your queue
lengths for great insight
and scaling!
0
2
4
6
8
10
12
14
16
18
1 3 5 7 9 11 13 15 17 19
88. Capacity utilization increases
queues exponentially
• Every time you reduce the excess capacity
by 1/2, you double the average queue size.
• This has a direct effect on the ratio of wait
time to work time for a single work unit
• Use this to balance cost vs. latency
0
2
4
6
8
10
10 20 30 40 50 60 70 80 90 100
89. • Variability increases
queue sizes linearly
• Operating at high
utilization increases
variability
The price of
variability
95. • What went wrong?
• How could we have detected it
sooner?
• How could we have prevented it?
• How can we prevent this class of
problem in the future?
• How can we improve our behavior
for next time?
Ask the key questions:
Incident Reviews