Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2lBEUKD.
Tom Faulhaber discusses the new container-based toolbox for building systems that are robust in the face of failures, how to recover from failure and how the tools can be used to best effect. Filmed at qconsf.com.
Tom Faulhaber is principal of Infolace, a San Francisco-based consultancy that helps clients from startups to global brands turn raw data into information and information into action. Throughout his career, Tom has developed systems for high-performance TCP/IP, large-scale scientific visualization, energy trading, and many more.
2. InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
architecture-failure-container
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
40. The Rules
Decomposition
Decompose vertically
Separation of concerns
Constrain state
Battle-tested tools
High code churn, easy
restart
No start-up order!
Consider higher-order
failure
Orchestration and
Synchronization
Managing Stateful Apps
49. Even battle-tested state management is a headache.
(Source: http://blog.cloudera.com/blog/2014/03/zookeeper-resilience-at-pinterest/)
50. The Rules
Decomposition
Decompose vertically
Separation of concerns
Constrain state
Battle-tested tools
High code churn, easy
restart
No start-up order!
Consider higher-order
failure
Orchestration and
Synchronization
Use framework restarts
Create your own framework
Use synchronized state
Minimize synchronized state
Managing Stateful Apps
55. Option 1: External DB
Pros
• Somebody else’s problem!
• Can use a DB designed for
clustering directly
• Can use DB as a service
Cons
• Not really somebody else’s
problem!
• Higher latency/no reference
locality
• Can’t leverage orchestration,
etc.
56. Option 2: Run on Raw HW
HDFS
Mesos
Marathon
App
HDFS
Mesos
Marathon
App
HDFS
Mesos
Marathon
App
57. Option 2: Run on Raw HW
Pros
• Use existing recipes
• Have local data
• Manage a single cluster
Cons
• Orchestration doesn’t help with
failure
• Increased management
complexity
59. Option 3: In-memory DB
Pros
• No need for volume tracking
• Fast
• Have local data
• Manage a single cluster
Cons
• Bets all machines won’t go
down
• Bets on orchestration
framework
61. Option 4: Use Orchestration
Pros
• Orchestration manages
volumes
• One model for all programs
• Have local data
• Single cluster
Cons
• Currently the least mature
• Not well supported by vendors
62. Option 5: Roll Your Own
Mesos
Marathon
App
ImageMgr
Mesos
Master
Framework
Mesos
Marathon
App
ImageMgr
Mesos
Marathon
App
ImageMgr
63. Option 5: Roll Your Own
Pros
• Very precise control
• You decide whether to use
containers
• Have local data
• Can be system aware
Cons
• You’re on your own!
• Wedded to a single
orchestration platform
• Not battle tested
65. The Rules
Decomposition
Decompose vertically
Separation of concerns
Constrain state
Battle-tested tools
High code churn, easy
restart
No start-up order!
Consider higher-order
failure
Orchestration and
Synchronization
Use framework restarts
Create your own framework
Use synchronized state
Minimize synchronized state
Managing Stateful Apps
Battle-tested tools
Choose the DB architecture
Have replication
67. References
• Rich Hickey:
“Are We There Yet?” (https://www.infoq.com/presentations/Are-We-
There-Yet-Rich-Hickey)
“Simple Made Easy” (https://www.infoq.com/presentations/Simple-
Made-Easy-QCon-London-2012)
• David Greenberg, Building Applications on Mesos, O’Reilly, 2016
• Joe Johnston, et al., Docker in Production: Lessons from the
Trenches, Bleeding Edge Press, 2015
68. The Rules
Decomposition
Decompose vertically
Separation of concerns
Constrain state
Battle-tested tools
High code churn, easy
restart
No start-up order!
Consider higher-order
failure
Orchestration and
Synchronization
Use framework restarts
Create your own framework
Use synchronized state
Minimize synchronized state
Managing Stateful Apps
Battle-tested tools
Choose the DB architecture
Have replication
69. Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
architecture-failure-container