John’s presentation will cover his lessons learned from running Docker in Production @ SalesforceIQ. Learn how to scale your registry using AWS and S3. Should you use Device Mapper or AUFS? Why run Swarm, Mesos, Kubernetes, or neither. Finally, know how persistent storage (Kafka, Cassandra, or SQL) can be run successfully with Docker in Production
His team focuses on Docker based solutions to power their SaaS infrastructure and developer operations.
3. About me
● I work for SalesforceIQ formerly RelateIQ
● I’ve used Docker for over 2 years
● I’ve done a couple of talks on Docker
o http://blog.heavybit.com/blog/2015/3/2
3/dockermeetup
o https://engineering.twitter.com/universi
ty/videos/chef-versus-docker-at-
relateiq
o https://www.youtube.com/watch?v=z9
yNq-IjCcM
● I co-authored this book:
o http://bleedingedgepress.com/docker-
in-the-trenches/
4. Docker Book
● 50% off for everyone!
● Click here!
https://gum.co/lQGH/dockerconeu
● Only $11.50
● 200 pages
7. What is production?
7
Production != test dev
Isolation, Security, Performance, Monitoring, Logging…
Scale, templates, automation…
What is successful?
>99% uptime or low # of outages?
Fast code deployment?
0 Security Incidents?
8. 100% of our web infrastructure running with Docker
8
Boom
9. 9
SalesforceIQ journey into production
2013 2014 2014 2014Q4Q4 Q1 Q2
Dev
Environment
Continuous
Deployment
in Teamcity
Web
Zero Downtime
Deployments
Full Stack
Container
Azkaban
DockerMe
Integrations
Batch Jobs
Mesos
DockerCon
2014
Dev/
Ops
CLI
Craft CMS
Main Website
Beanstalk
2015+
Devenv 2.0
P
a
a
S
Now2015
10.
11. Database
CI/CD Server
Dev or Ops
Environment
Web Server
Api Server
CI/CD Agents
Batch Jobs
Integrations
What we’ve put in containers
Rate of Change
Dependencies
12. Database
CI/CD Server
Dev or Ops
Environment
Web Server
Api Server
CI/CD Agents
Batch Jobs
Integrations
Stateful
Long-Life
Stateless
Short-Life
What we’ve put in containers
13. Zoom in a little
Persistent Storage
Middleware / Integrations /
Internal Tools / Scripts / Jobs
Web
Monitoring
Logging
Security
Dev
Environment
Ops
Environment
CI / CD
Fully Somewhat No
Create Deploy Run Operate
Dockerized
Batch & Stream processing
15. Lots of tidbits
1
● Docker is prod ready but many surrounding
solutions are not (alpha and beta)
o Caution with the new toys is required
● Don’t go straight towards a PaaS if you're just
starting out
o Kubernetes, Mesos, CoreOS, Swarm, ECS
● Keep it simple
o Know what works and what doesn’t
● Old tools still work great, and I’ll show you how
o Know how to scale what you're doing
● You're going to have to roll your own at some point
(orchestration)
o Roll up your sleeves
● Learn from others, Tons of people in production
now
o Read the whole internet
● You can secure running containers
o Tons of solutions now
● Get creative
16.
17. You can docker with Chef, Ansible, SaltStack...
• You can use the tools you have today if you're not dockerized already
• What…
• But those are the tools i’m already using...
• Yes they still work and work great
19. Our current prod web server
● Worked with all our existing
tools!
○ Chef, Monitoring, Logging
● Security didn’t change
○ Security keys
○ Firewall
● Super easy to scale
○ Could pack with Packer to
create AMI
○ Shell script was super easy
● Zero downtime
● Rollbacks
Web Container
v1
Web Container
v2
Hipache/Redis Container
Amazon AMI setup with Chef
Cron job to run shell script to orchestrate containers
25. File system...
“Containers wouldn’t stop correctly because Docker was not unmounting volumes
reliably. This caused containers to run forever, even after the task completed. The
workaround was unmounting volumes and deleting folders explicitly using an
elaborate set of custom scripts. Fortunately this was in the early days when we were
using docker v0.7.6. We removed this lengthy scripting once the unmount problem
was fixed in docker v0.9.0.”
“After researching and playing with devicemapper (a docker filesystem driver), we
found specifying an option that did the trick `–storage-opt dm.blkdiscard=false`. This
option tells Docker to skip an expensive disk operation when containers are deleted,
which greatly speeds up the container shutdown process. Once the delete script
was modified, the problem went away.”
Kernel version matters!
Great visual deep dive
http://merrigrove.blogspot.com/2015/10/visualizi
ng-docker-containers-and-images.html?m=1
What we used overtime
1. Started with AUFS - hit 42 layer limit
2. Then moved to device mapper
a. Device/Volume not found
b. NNOOOOOOOOOO
3. Back using AUFS again after bug fixes
and layer 42 limit removal
a. Continue to fight layer issues, mount
issues
4. Back to device mapper with Docker 1.7
dynamic binaries!
What we’ved landed on
Ubuntu = AUFS
Amazon Linux = Device mapper
26. Get a good registry
Great options
• Hub.docker.com
• Quay.io
• Trusted registry
• Google
• Azure
• AWS
• S3.. no registry…
save/load
1. We started private registry
a. went insane with buggy
releases, failed pulls/pushes
2. Went to quay.io
a. happy but slow, and costs
$$
3. Back to private registry 0.9
release… now stable
4. Scaled it and working great
5. Now working on upgrading to
Docker Registry 2.1
28. Isolation is your friend
Single service/container per server?
• Compute
Spikey Processing… no problem
• Storage
Out of disk… no problem
• Networking
Shared bandwidth… no problem
• Ram
Swapping issue… no problem
• Security Groups
Least privilege… no problem
Web Container
v1
Web Container
v2
Hipache/Redis Container
Amazon AMI setup with Chef
Cron job to run shell script to orchestrate containers
29. CI/CD with Docker
• The biggest ROI with Docker
• Teamcity
• Used to use Docker in Docker
Point to great blog post
• Agents used to run in a docker container
Now built with chef and packer
Github.com
Dockerfile
Teamcity
Agent Agent Agent
Registry
Server
32. Beanstalk
-Cloud formation
EC2 Server
Autoscaling
Isolation
Security Groups
Environment Variables
Beanstalk architecture
• Oh wow Beanstalk is pretty
great!
• Run Over 50+ services on
beanstalk today
• Automagically built web container
per branch of code
• Corp site/Help site
• 100% automated!!
• Great for Web services SOA Storage
Easy to spin up
DNS service discovery
Load balancer
SSL Termination ELB
Container
RDS
34. One year ago
• CoreOS... so cool
• Mesos… cool with scale
• Swarm… beta
• Deis… oooo saas
• ECS… ok now we're getting somewhere
• Kubernetes… where did that come from… looks cool too
Now…..
• Kubernetes on top of DCOS, on top of Mesos, on top of CoreOS…
facepalm
35. PaaS Overview
CoreOS DCOS Kubernetes ECS
Orchestration
Scheduler
Resource Allocation
Service Discovery
More than Containers
Health Check
Storage clustering?
Live Migration?
36. DCOS
Mesos Private Slave
Auto Scaling
Health Checks
Intelligence
Brain Router
Being successful with a PaaS?
Our DCOS Architecture
Built a edge router
Built a Brain router
Infra CLI
This will run all of our
stateless services
Mesos Public Slave
Auto Scaling
Service Discovery
Public <> Private DNS
Can be Internal as well
Storage
SSL Termination
DNS
ELB
Edge
Router
DB2
ServiceService
Edge
Router
DB3DB1
Mesos Master
Marathon
Health Check
API
38. Summary
• Get the book
• Starting out? Just use the same tools you have
• You’ll need to roll up your sleeves
• Security is not hard but you need to think about it
• Many vendors are entering container space
• Build towards a PaaS
• Many solutions to PaaS
• Know what you're trying to solve
• Become the Jedi you want to be
• Have fun!