SlideShare une entreprise Scribd logo
1  sur  69
Architecting for Failure
in a Containerized World
Tom Faulhaber
Infolace
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
architecture-failure-container
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
How can container tech help us
build robust systems?
Key takeaway: an architectural
toolkit for building robust
systems with containers
The Rules
Decomposition Orchestration and
Synchronization
Managing Stateful Apps
Simplicity
Simple means:
“Do one thing!”
The opposite of
simple is complex
Complexity exists
within
components
Complexity exists
between
components
Example: a counter
Counter
Service
1 2 3 4 50 …
Counter
Service
1 2 3 4 50 …x Counter
Service
1 2 3 4 50
1 2 3 4 50 1 2 3 4 50
Example: a counter
Counter
Service
1 2 3 4 50 …
Counter
Service
1 2 3 4 50 …
Load
Balancer
1 2 3 4 50 1 2 3 4 50
State + composition =
complexity
Part 1:
Decomposition
Rule:
Decompose vertically
App Server
Service
#1
Service
#2
Service
#3
App Server
Rule:
Separation of concerns
Example: Logging
App
Core
Code
Logging
Driver
Config
Logging Server
Example: Logging
Logger
App
Core
Code
Logging
Driver
Config
Logging Server
StdOut
Aspect-oriented programming
Rule:
Constrain state
Relational DB
Session Store
Rule:
Battle-tested tools
Redis
MySQL
Rule:
High code churn
→Easy restart
Rule:
No start-up order!
time
a
b
c
d
time
x
a
b
c
d
time
x
a
b
c
d
x
x
x
time
x
a
b
c
d
x
x
x
time
a
b
c
d
time
a
b
c
d
time
a
b
c
d
Rule:
Consider higher-order failure
The Rules
Decomposition
Decompose vertically
Separation of concerns
Constrain state
Battle-tested tools
High code churn, easy
restart
No start-up order!
Consider higher-order
failure
Orchestration and
Synchronization
Managing Stateful Apps
Part 2:
Orchestration and
Synchronization
Rule:
Use Framework Restarts
• Mesos: Marathon always restarts
• Kubernetes: RestartPolicy=Always
• Docker: Swarm always restarts
Rule:
Create your own framework
Mesos
Agent
Framework
Executor
Mesos
Master
Framework
Driver
Mesos
Agent
Framework
Executor
Mesos
Agent
Framework
Executor
Rule:
Use
Synchronized State
Synchronized State
Tools:
- zookeeper
- etcd
- consul
Patterns:
- leader election
- shared counters
- peer awareness
- work partitioning
Rule:
Minimize
Synchronized State
Even battle-tested state management is a headache.
(Source: http://blog.cloudera.com/blog/2014/03/zookeeper-resilience-at-pinterest/)
The Rules
Decomposition
Decompose vertically
Separation of concerns
Constrain state
Battle-tested tools
High code churn, easy
restart
No start-up order!
Consider higher-order
failure
Orchestration and
Synchronization
Use framework restarts
Create your own framework
Use synchronized state
Minimize synchronized state
Managing Stateful Apps
Part 3:
Managing Stateful Apps
Rule (repeat!):
Always use battle-tested tools!
(State is the weak point)
Rule:
Choose the DB architecture
Option 1: External DB
Execution cluster
Database cluster
Option 1: External DB
Pros
• Somebody else’s problem!
• Can use a DB designed for
clustering directly
• Can use DB as a service
Cons
• Not really somebody else’s
problem!
• Higher latency/no reference
locality
• Can’t leverage orchestration,
etc.
Option 2: Run on Raw HW
HDFS
Mesos
Marathon
App
HDFS
Mesos
Marathon
App
HDFS
Mesos
Marathon
App
Option 2: Run on Raw HW
Pros
• Use existing recipes
• Have local data
• Manage a single cluster
Cons
• Orchestration doesn’t help with
failure
• Increased management
complexity
Option 3: In-memory DB
Mesos
Marathon
App
MemSQL
Mesos
Marathon
App
MemSQL
Mesos
Marathon
App
MemSQL
Option 3: In-memory DB
Pros
• No need for volume tracking
• Fast
• Have local data
• Manage a single cluster
Cons
• Bets all machines won’t go
down
• Bets on orchestration
framework
Option 4: Use Orchestration
Mesos
Marathon
App
Cassandra
Mesos
Marathon
App
Cassandra
Mesos
Marathon
App
Cassandra
Option 4: Use Orchestration
Pros
• Orchestration manages
volumes
• One model for all programs
• Have local data
• Single cluster
Cons
• Currently the least mature
• Not well supported by vendors
Option 5: Roll Your Own
Mesos
Marathon
App
ImageMgr
Mesos
Master
Framework
Mesos
Marathon
App
ImageMgr
Mesos
Marathon
App
ImageMgr
Option 5: Roll Your Own
Pros
• Very precise control
• You decide whether to use
containers
• Have local data
• Can be system aware
Cons
• You’re on your own!
• Wedded to a single
orchestration platform
• Not battle tested
Rule:
Have replication
The Rules
Decomposition
Decompose vertically
Separation of concerns
Constrain state
Battle-tested tools
High code churn, easy
restart
No start-up order!
Consider higher-order
failure
Orchestration and
Synchronization
Use framework restarts
Create your own framework
Use synchronized state
Minimize synchronized state
Managing Stateful Apps
Battle-tested tools
Choose the DB architecture
Have replication
Fin
References
• Rich Hickey:

“Are We There Yet?” (https://www.infoq.com/presentations/Are-We-
There-Yet-Rich-Hickey)

“Simple Made Easy” (https://www.infoq.com/presentations/Simple-
Made-Easy-QCon-London-2012)
• David Greenberg, Building Applications on Mesos, O’Reilly, 2016
• Joe Johnston, et al., Docker in Production: Lessons from the
Trenches, Bleeding Edge Press, 2015
The Rules
Decomposition
Decompose vertically
Separation of concerns
Constrain state
Battle-tested tools
High code churn, easy
restart
No start-up order!
Consider higher-order
failure
Orchestration and
Synchronization
Use framework restarts
Create your own framework
Use synchronized state
Minimize synchronized state
Managing Stateful Apps
Battle-tested tools
Choose the DB architecture
Have replication
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
architecture-failure-container

Contenu connexe

Plus de C4Media

Plus de C4Media (20)

Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Architecting for Failure in a Containerized World