2. Agenda
• A little background about why we decided to
build an internal PaaS.
• Introduction to Empire.
• How we’re leveraging ECS as the backend.
• Demo
• Q&A
3. Who am I
• Eric Holmes
• Infrastructure Engineer at Remind
• I like building things for other developers
• Work mostly with Go and Ruby
• You can find my open source stuff at
https://github.com/ejholmes
4. What’s Remind?
• Remind is a messaging platform for teachers,
students and parents.
• Chat/Announcements/Files
• ~25 million users. ~350,000 new users per day
during BTS
• ~5 million messages per day.
• ~50 employees. ~30 engineers.
8. Broke apart the monolith
• Sidekiq queues were IO bound and constantly
backed up during BTS
• Message delivery workers were tightly coupled
to the rest of the application. Difficult to scale out
horizontally
• Database would need to be sharded
• Started breaking the monolith apart into loosely
coupled services.
• Now have ~50 production services
9. Heroku
• Entirely hosted on Heroku
• Heroku has been awesome; never needed an
ops team.
• Allowed us to focus on building product.
10. But we ran into issues...
• “Internal” micro-services need to be exposed
publicly.
• Databases need to be opened up to all traffic.
• Little visibility into performance of hosts.
• No control over the routing layer.
11. What do we want?
• Want to use AWS services.
• Want to maintain operational simplicity.
• Support 12 factor apps. http://12factor.net/
• Maintain shared patterns for deployment. Faster iteration and build +
release cycles
• No ops.
• Decrease our surface area and only expose a single app publicly.
• Robust and resilient to failure. Self-healing.
• If we can, continue to use containers as a unit of deployment.
12. Why containers?
• Fast to build*
• Let us isolate dependencies as a portable, easy-
to-distribute package.
• Allow us to create better development
environments with more dev/prod parity.
• Limit the number of moving parts when we
deploy.
• Better resource utilization and cost management
13. We’re not the first company to want a PaaS
• Netflix - Asgard
• SoundCloud - Bazooka
• Every other company in our investor’s portfolio...
14. Something we can re-use?
• Flynn
–Alpha
–Undergoing many architectural changes
–Custom load balancer
• Deis
–More than it needed to be
–Nobody using it successfully in production
(that we knew of)
15. Empire was born
• Initially started as a management layer on top of
CoreOS + fleet.
• Load balancing via nginx configured through
confd + etcd.
• Unit of deployment was Docker containers
• Implemented a subset of the Heroku API
16. Therein lies the rub...
• Fleet initially worked well, until we started testing
failure modes.
• Fleet had a lot of bugs
• etcd was fragile
• We needed resilience and stability
• We didn’t want to run and operate our own
clustering.
17. ECS becomes GA
• ECS became GA while we were looking for an
alternative scheduler.
• Looked promising to serve as the scheduling
backend.
18. What is ECS?
• Pools hosts together as a single compute
resource.
• Provides a set of APIs for placing tasks on
machines
• Scheduler supports “services” for scaling tasks
horizontally and maintaining desired state.
• Services integrate with ELB for connection
draining, zero downtime, and healthchecks.
21. ECS for Empire
• Solid set of primitives to serve as the scheduling
backend
• Managed service
• Failure modes behaved as we expected them to
• ELB integration allowed us to remove custom
routing layer
• Service discovery via DNS
22. What is Empire?
• Open source internal PaaS for micro-services
• A layer of usability on top of ECS for 12 factor
apps
• Single binary. Minimal deps. Easy to run.
• Provides an API and CLI to create apps, deploy
docker images, update configuration, run one off
tasks etc.
• Allows you to use Procfiles to build multiple ECS
services
23. Is it ready for production?
• Running ~15 production services within ECS
managed via Empire for a little over a month
• Empire is hands off after you’ve deployed. AWS
services take over
• Moving directly onto EC2 showed huge
performance improvements for services
25. What does Empire not do?
• Bring your own logging and metrics (soon?)
• It doesn’t handle building your Docker images
• Doesn’t handle the creation of attached
resources like Databases