More Related Content Similar to From Monolith to Microservices (And All the Bumps along the Way) (CON360-R1) - AWS re:Invent 2018 (20) More from Amazon Web Services (20) From Monolith to Microservices (And All the Bumps along the Way) (CON360-R1) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
From Monolith to Microservices
(And All the Bumps along the Way)
Max Blaze
Staff Operations Engineer
Duolingo
C O N 3 6 0 - R
Brent Langston
Sr. Developer Advocate
Amazon Web Services
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How do I prepare?
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
People
WorkflowTechnical
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What are some hurdles to prepare for?
People
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What are some hurdles to prepare for?
Old ways in contrast to new ways
People
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What are some hurdles to prepare for?
Old ways in contrast to new ways
“How can I develop locally?”
People
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What are some hurdles to prepare for?
Old ways in contrast to new ways
“How can I develop locally?”
“Deployments are painful. Will that get worse?”
People
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What are some hurdles to prepare for?
Old ways in contrast to new ways
“How can I develop locally?”
“Deployments are painful. Will that get worse?”
“My config management already handles everything
perfectly!”
People
10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What are some hurdles to prepare for?
Old ways in contrast to new ways
“How can I develop locally?”
“Deployments are painful. Will that get worse?”
“My config management already handles everything
perfectly!”
“What do you mean I have to be on call?”
People
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How can AWS help? People
12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How does Docker help with the migration?
Workflow
13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How does Docker help with the migration?
• Packaging
It becomes the same, no matter the language.
Docker multi-stage builds—tests in one container, copies the final artifact to a newer,
smaller container.
Workflow
14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How does Docker help with the migration?
• Packaging
It becomes the same, no matter the language.
Docker multi-stage builds—tests in one container, copies the final artifact to a newer,
smaller container.
• Distribution
It becomes the same, no matter the language.
Docker pulls the desired image tag, and Docker runs the container. Language doesn’t
matter.
Workflow
15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How does Docker help with the migration?
• Packaging
It becomes the same, no matter the language.
Docker multi-stage builds—tests in one container, copies the final artifact to a newer,
smaller container.
• Distribution
It becomes the same, no matter the language.
Docker pulls the desired image tag, and Docker runs the container. Language doesn’t
matter.
• Scaling up and down becomes fast
Docker images start quickly, so we can add them and remove them in response to current
demand.
Workflow
16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How does Docker help with the migration?
• Packaging
It becomes the same, no matter the language.
Docker multi-stage builds—tests in one container, copies the final artifact to a newer
smaller container.
• Distribution
It becomes the same, no matter the language.
Docker pulls the desired image tag, and Docker runs the container. Language doesn’t
matter.
• Scaling up and down becomes fast.
Docker images start quickly, so we can add them and remove them in response to current
demand.
• Experimenting becomes easier, so it’s encouraged.
Since packaging and distribution are trivially easy, we can experiment with code changes,
and roll them back, or iterate on them with ease, typically without any additional demand
for resources.
Workflow
Workflow
17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How can AWS help?
Workflow
Workflow
18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What problems does orchestration solve?
Technical
19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What problems does orchestration solve?
Is the process running?
Replaces monit, supervisord, systemd
Technical
20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What problems does orchestration solve?
Is the process healthy?
Include load balancer healthchecks
Include Docker healthcheck
Only healthcheck one layer
Technical
21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What problems does orchestration solve?
How do I get traffic to my processes?
Integration with ALB/NLB
Technical
22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What problems does orchestration solve?
How does one service find another?
Service discovery
Technical
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What problems does orchestration solve?
When should I scale up (or down)?
Metrics monitoring
TPS calculation
Auto scaling
Technical
24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What problems does orchestration solve?
How do I deploy new code?
Docker replaces Fabricator/Chef/Ansible/and the like
No more copying files, moving symlinks, restarting
processes
No custom deploy logic to maintain
tag with git-sha, deploy new tag
Blue green deploys for most things
Some processes should be red/black
Technical
25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What problems does orchestration solve?
How do I add, replace, or remove instances?
Instances are now simply resources, nothing special
No “special snowflakes”
Configuration is immutable
Replace, don’t reconfigure
quickly add or drain/replace/remove instances
Technical
26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What problems does orchestration solve?
How do I save money?
Replace “special snowflakes” with generic instances that are
resource pools
Docker makes it easier to co-locate services
Instances can run at higher utilization
Fewer instances are necessary
Can use instance store for disk space, rather than Amazon EBS
Amazon ECS integrates with spot fleet for even cheaper
instances
Technical
27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How can AWS help?
Technical
28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Introducing:
Max Blaze
Staff Operations Engineer
Duolingo
C O N 3 6 0 - R
29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Free and accessible language education for all
31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The most downloaded education app in the world
32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
30+ languages/80+ courses
33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
300M+ users worldwide
35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
150 employees
36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Duolingo growth
37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A brief history
Launch
Config
management
AWS Auto Scaling
First microservice
Centralized
dashboards + logging
Infrastructure as code
First microservice on Amazon ECS
38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why move to microservices?
Scalability
Cost savings
Velocity
ReliabilityFlexibility
39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How do you decide what to carve out of your monolith first?
• Start with a small, but impactful feature
• Move up in size, complexity, and risk
• Consider dependencies
40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
0.99
Monolith
System availability
41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chained microservices
0.99 * 0.99 * 0.99 = 0.97
0.990.990.99
42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Independent microservices
1 - (1 - 0.99)3 = 0.999999
0.990.990.99
43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why use Docker for microservices?
• Standardizes the build process and encapsulates dependencies
• Local development environment similar to production
• Quick deployments and rollbacks
• Flexible resource allocation
44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Simplifying local development setup (old way)
1. Clone this repository.
2. Set up and activate a virtualenv and install requirements using pip install -r requirements.txt.
3. Download and install Postgres: brew install postgresql
4. Run Postgres locally: postgres -D /usr/local/var/postgres
5. Download pgAdmin3 (not totally necessary, but will make life easier).
6. Using pgAdmin3, create a new login role under your local server with name "admin" and password
"somepassword".
7. Create a DB called “db".
8. Run the migration script in the repo using python manage.py db upgrade.
9. Check that your DB is now populated with tables.
10. Set up and run memcached: brew install memcached
11. Set up and run redis: brew install redis-server
12. Set up and run elasticsearch: brew install elasticsearch
13. Finally, try to run the server using python application.py. You can test if it’s working by going here
(you should see user info) and here (you should see {'data': []}).
45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Simplifying local development setup (new way)
$ docker-compose build
$ docker-compose up
46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why use Docker with Amazon ECS?
Task auto scaling
Manageability
Amazon CloudWatch
metrics
Dynamic ALB targetsTask-level AWS IAM
47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Microservice abstractions at Duolingo
Web service (internal or external)
Worker service (daemon or cron)
Monitoring
Amazon
Route53 ALB
Amazon
ECS
tasks
Amazon
SQS
Amazon
ECS task
Event
Data stores
Amazon RDS
Amazon DynamoDB
Redis/Memcached
AWS KMS
ELK stack
CloudWatch Grafana
PagerDuty
48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Microservice definition in Terraform
module “duolingo-api" {
source = “repo/terraform//modules/ecs_web_service"
environment = “prod"
product = “duolingo"
service = “api"
owner = “Max Blaze"
min_count = 2
max_count = 4
cpu = 512
memory = 256
ecs_cluster = "prod"
internal = "true"
container_port = 5000
version = "${var.version}"
}
Billing tags
Resources
Auto scaling
49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aurora database cluster definition in Terraform
module “duolingo-api—db“ {
source = “repo/terraform//modules/ecs_web_service"
product = “duolingo”
service = “api”
subservice = “db"
owner = “Max Blaze”
cluster_identifier = “duolingo-api-db-cluster"
identifier = “duolingo-api"
engine = "aurora-postgresql"
name = "duolingo"
password = "${data.aws_kms_secret.duolingo_api_db.duolingo_api_db}"
instance_class = "db.r4.large"
num_cluster_instances = 2
}
Billing tags
Instance type
DB engine
50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Continuous integration and deployment
Developer
GitHub
Jenkins
Terraform
Deployment
Terraform state
Amazon S3 bucket
Plan/apply with
version string
Amazon ECS
Amazon
ECR repo
Dockerfile build
Docker image push
Docker image pull
51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Load balancing
• ALBs and CLBs operate at different network layers
• ALBs are more strict when handling malformed requests
• ALBs default to HTTP/2
• Headers are always passed as lowercase
• There are differences in Amazon CloudWatch metrics
ALB ≠ CLB
52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Task-level AWS IAM role permissions
• Apply permissions at the service level
• Do not share permissions across microservices
• Needs to be supported by the AWS client library
53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Standardizing microservices
• Develop a common naming scheme for repos and services
• Autogenerate as much of the initial service as possible
• Move core functionality to shared base libraries
• Provide standard alarms and dashboards
• Periodically review microservices for consistency and quality
54. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring microservices
Web service dashboard
• Local time and UTC
• Healthy, unhealthy, and
running tasks
• Latency average and
percentiles
• Number of requests
• CPU and memory utilization
(min/avg/max)
• Service errors by AZ
• ALB errors by AZ
55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring microservices
Worker service dashboard
• Local time and UTC
• Running tasks
• CPU and memory
utilization (min/avg/max)
• Visible messages
• Deleted messages
56. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring microservices
PagerDuty integration
• Schedules and
rotations are defined
in Terraform
• Emergency alarms
page (high latency)
• Warning alarms go to
e-mail (low memory)
• Include links to
playbooks
• All pages are also
visible in Slack
57. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Grading microservices
Architecture Documentation Processes Tests
58. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Grading microservices
59. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• Cluster
• Instance type
• Pricing options
• Auto scale
• Add/remove AZs
Web
Worker
API
• Task
• Resource allocation
• Auto scale
Cost reduction options
60. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cluster starting point
c3.2xlarge
Reserved instances
On-demand
61. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High-CPU instance generations
Speed $/hour Disk
c3.large - 0.105 SSD
c4.large +20% of c3 0.100
None (Amazon EBS-
only)
c5d.large +25% of c4 0.096 NVMe
c5 is 50% faster than c3!
62. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Moving to a new Amazon EC2 generation
Latest instances are generally faster and cheaper but…
• “cpu” units in Amazon ECS will not be equivalent
• Auto scaling may not work properly between generations
(1 vCPU core = 1024 units)
c5.large
cpu = 1024
c3.large
cpu = 1024
c4.large
cpu = 1024
> >
63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fixed number of instances
c5d.2xlarge
Reserved instances
On-demand
c5d.large…c5d.18xlarge
m5d.large...m5d.24xlarge
Reserved instances
On-demand
Spot
Auto scaling
64. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fixed number of instances
c5d.2xlarge
Reserved instances
On-demand
c5d.large…c5d.18xlarge
m5d.large...m5d.24xlarge
Reserved instances
On-demand
Spot
65. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Spotinst cluster features
• Mixes families + sizes
• Uses RIs before spot
• 15 minute spot notice
• Fits capacity to Amazon ECS
tasks
• AZ capacity heat map
66. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Spotinst cluster features
• Drains Amazon ECS tasks
• Cluster “headroom”
• Spreads capacity across AZs
• Bills on % of savings
• Terraform support
67. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Auto scaling with Spotinst
68. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What about per-microservice costs?
• Audit CPU/memory allocations for each service
• Update auto scaling and/or CPU allocations as needed
Goals
60% CPU
60–80% Memory
69. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Adjusting allocated CPU for scaling
allocatedCPU * currentUtilization = actualCPU
actualCPU/desiredUtilization = Units to set
Example:
Current utilization: 40%
Desired utilization: 60%
1024 * 40% = 409.6
409.6/60% = 682.67
Set Amazon ECS “cpu” allocation to 683
(1 vCPU core = 1024 units)
70. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Adjusting allocated memory
• Track memory usage between deployments
• Constantly increasing memory usage points to memory leaks
• Set containers to restart if memory exceeds 100%
71. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
API costs
ListAllMyBuckets + GetObject > 50% of Amazon S3 cost!
72. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cost savings
May July August September October
Amazon EC2 compute costs
> 60% reduction from May to October
> 60% reduction in compute
costs
> 30% reduction in costs per
monthly active user (MAU)
> 25% reduction total AWS bill
73. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Key results
• Scalability
• Manage ~100 microservices
• Velocity
• Teams deploy to their own services
• Flexibility
• Officially support three different programming languages
• Reliability
• 99.99% availability achieved last quarter
• Cost
• 60% reduction in compute costs
74. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
75. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resources
• Books
• Building Microservices: Designing Fine-Grained Systems (Sam Newman)
• Microservices in Production (Susan J. Fowler)
• References
• ec2instances.info
• github.com/open-guides/og-aws
• Tools and services
• ansible.com
• docker.com
• elastic.io
• github.com
• grafana.com
• jenkins.io
• pagerduty.com
• spotinst.com
• terraform.io
77. Please complete the session
survey in the mobile app.
!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.