SlideShare une entreprise Scribd logo
1  sur  77
The Highs and Lows of Stateful
Containers
Presented by Alex Robinson / Member of the Technical Staff
@alexwritescode
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
kubernetes-stateful-containers
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
Almost all real applications rely on state
When storage systems go down, so do
the applications that use them
Containers are new and different
Change is risky
Great care is warranted when moving
stateful applications into containers
To succeed, you must:
To succeed, you must:
1. Understand your stateful application
To succeed, you must:
1. Understand your stateful application
2. Understand your orchestration system
To succeed, you must:
1. Understand your stateful application
2. Understand your orchestration system
3. Plan for the worst
Let’s talk about stateful containers
• Why would you even want to run stateful applications in containers?
• What do stateful systems need to run reliably?
• What should you know about your orchestration system?
• What’s likely to go wrong and what can you do about it?
My experience with stateful containers
• Worked directly on Kubernetes and GKE from 2014-2016
○ Part of the original team that launched GKE
• Lead all container-related efforts for CockroachDB
○ Configurations for Kubernetes, DC/OS, Docker Swarm, even Cloud Foundry
○ AWS, GCP, Azure, On-Prem
○ From single availability zone deployments to multi-region
○ Help users deploy and troubleshoot their custom setups
Why even bother?
We’ve been running stateful services for decades
Traditional management of stateful services
1. Provision one or more beefy machines with
large/fast disks
2. Copy binaries and configuration onto machines
3. Run binaries with provided configuration
4. Never change anything unless absolutely
necessary
Traditional management of stateful services
• Pros
○ Stable, predictable, understandable
• Cons
○ Most management is manual, especially to scale or recover from hardware failures
■ And that manual intervention may not be very well practiced
Moving to containers
• Can you do the same thing with containers?
○ Sure!
○ ...But that’s not what you’ll get by default if you’re using any of the common
orchestration systems
So why move state into orchestrated containers?
• The same reasons you’d move stateless applications to containers
○ Automated deployment, placement, security, scalability, availability, failure recovery,
rolling upgrades
■ Less manual toil, less room for operator error
○ Resource isolation
• Avoid separate workflows for stateless vs stateful applications
Challenges of managing state
“Understand your stateful application”
What do stateful systems need?
What do stateful systems need?
• Process management
• Persistent storage
What do stateful systems need?
• Process management
• Persistent storage
• If distributed, also:
○ Network connectivity
○ Consistent name/address
○ Peer discovery
What do stateful systems need?
• Process management
• Persistent storage
• If distributed, also:
○ Network connectivity
○ Consistent name/address
○ Peer discovery
What do stateful systems need?
• Process management
• Persistent storage
• If distributed, also:
○ Network connectivity
○ Consistent name/address
○ Peer discovery
Managing state in plain Docker containers
“Understand your orchestration system”
Stateful applications in Docker
• Not much to worry about here other than storage
○ Never store important data to a container’s filesystem
Stateful applications in Docker
1. Data in container 2. Data on host filesystem 3. Data in network storage
Stateful applications in Docker
• Don’t:
○ docker run cockroachdb/cockroach start
• Do:
○ docker run -v /mnt/data1:/data cockroachdb/cockroach start --store=/data
Stateful applications in Docker
• Don’t:
○ docker run cockroachdb/cockroach start
• Do:
○ docker run -v /mnt/data1:/data cockroachdb/cockroach start --store=/data
• And in most cases, you’ll actually want:
○ docker run -p 26257:26257 -p 8080:8080 -v /mnt/data1:/data
cockroachdb/cockroach start --store=/data
Stateful applications in Docker
• Hardly any different from running things the traditional way
• Automated - binary packaging/distribution, resource isolation
• Manual - everything else
Managing State on Kubernetes
“Understand your orchestration system”
Let’s skip over the basics
• Unless you want to manually pin pods to nodes (see previous section),
you should use either:
○ StatefulSet:
■ decouples replicas from nodes
■ persistent address for each replica, DNS-based peer discovery
■ network-attached storage instance associated with each replica
○ DaemonSet:
■ pin one replica to each node
■ use node’s disk(s)
Where do things go wrong?
Don’t trust the defaults!
• If you don’t specifically ask for persistent storage, you won’t get any
○ Always think about and specify where your data will live
Don’t trust the defaults!
• If you don’t specifically ask for persistent storage, you won’t get any
○ Always think about and specify where your data will live
1. Data in container 2. Data on host filesystem 3. Data in network storage
Ask for a dynamically provisioned PersistentVolume
Don’t trust the defaults!
• Now your data is persistent
• But how’s performance?
Don’t trust the defaults!
• If you don’t create and request your own StorageClass, you’re
probably getting slow disks
○ Default on GCE is non-SSD (pd-standard)
○ Default on Azure is non-SSD (non-managed blob storage)
○ Default on AWS is gp2, which are backed by SSDs but with fewer IOPs than io2
• This really affects database performance
Use a custom StorageClass
Performance problems
• There are a lot of other things you have to do to get performance
equivalent to what you’d get outside of Kubernetes
• For more detail, see
https://cockroachlabs.com/docs/kubernetes-performance.html
What other defaults are bad?
What other defaults are bad?
• If you:
○ Create a Kubernetes cluster with 3 nodes
○ Create a 3-replica StatefulSet running CockroachDB
• What happens if one of the nodes fails?
Don’t trust the defaults!
Node 1
cockroachdb-0
cockroachdb-1
Node 2
Range 1
Node 3
Range 2
cockroachdb-2
Range 3
Don’t trust the defaults!
• If you don’t specifically ask for your StatefulSet replicas to be
scheduled on different nodes, they may not be (k8s issue #41130)
○ If the node with 2 replicas dies, Cockroach will be unavailable until they come back
• This is terrible for fault tolerance
○ What’s the point of running 2 database replicas on the same machine?
Configure pod anti-affinity
What can go wrong other than bad defaults?
What else can go wrong?
• In early tests, Cockroach pods would fail to get re-created if all of them
were brought down at once
• Kubernetes would create the first pod, but not any others
What else can go wrong?
Know your app and your orchestration system
• StatefulSets (by default) only create one pod at a time
• They also wait for the current pod to pass readiness probes before
creating the next
Know your app and your orchestration system
• StatefulSets (by default) only create one pod at a time
• They also wait for the current pod to pass readiness probes before
creating the next
• The Cockroach health check used at the time only returned healthy if
the node was connected to a majority partition of the cluster
Before the restart
healthy?
yes
If just one node were to fail
healthy?
yes
If just one node were to fail
healthy?
yes
Create missing pod
After all nodes fail
healthy?
no
Wait for first pod to
be healthy before
adding second
Wait for connection
to rest of cluster
before saying I’m
healthy
Solution to pod re-creation deadlock
• Keep basic liveness probe endpoint
○ Simply checks if process can respond to any HTTP request at all
• Create new readiness probe endpoint in Cockroach
○ Returns HTTP 200 if node is accepting SQL connections
Solution to pod re-creation deadlock
• Keep basic liveness probe endpoint
○ Simply checks if process can respond to any HTTP request at all
• Create new readiness probe endpoint in Cockroach
○ Returns HTTP 200 if node is accepting SQL connections
• Now that it’s an option, tell the StatefulSet to create all pods in parallel
Other potential issues to look out for
• Set resource requests/limits for proper isolation and to avoid evictions
• No PodDisruptionBudgets by default (#35318)
• If in the cloud, don’t depend on your nodes to live forever
○ Hosting services (I’m looking you, GKE) tend to just delete and recreate node VMs in
order to upgrade node software
○ Be especially careful about using the nodes’ local disks because of this
• If on-prem, good luck getting fast, reliable network attached storage
Other potential issues to look out for
• If you issue TLS certificates for StatefulSet DNS addresses, don’t forget
to include the namespace-scoped addresses
○ “cockroachdb.default.kubernetes.svc.local” vs just “cockroachdb”
○ Needed for cross-namespace communication
○ Also don’t put pod IPs in node certs - it’ll work initially, but not after pod re-creation
• Multi-region stateful systems are really tough to make work
○ Both network connectivity and persistent addresses are hard to set up
○ Hopefully you went to yesterday’s Cilium and Istio talks
How to get started
Isn’t this all a lot of work?
Gettings things right is far from easy
What should you do if you aren’t an
expert on the systems you want to use?
How to get started
• You could take the time to build expertise
How to get started
• You could take the time to build expertise
• But ideally someone has already done the hard work for you
Off-the-shelf configurations
• There are great configurations available for popular OSS projects
• They’ve usually been made by someone who knows that project well
• They’ve often already been proven in production by other users
Off-the-shelf configurations
• Kubernetes off-the-shelf configs are unfortunately quite limited
○ YAML forces the config writer to make decisions that would best be left to the user
○ No built-in method for parameter substitution
• How could a config writer possibly know your desired:
○ StorageClass
○ Disk size
○ CPU and memory requests/limits
○ Application-specific configuration options
○ etc.
Enter: package managers
• Additional formats have been defined to make parameterizing easier
• Package creator defines set of parameters that can be easily overriden
• User doesn’t have to understand or muck with YAML files
○ Just look through list of parameters and pick which need customizing
Orchestrator package managers
• Kubernetes: Helm
○ helm.sh
○ github.com/helm/charts/
• DC/OS: Universe
○ universe.dcos.io
• Cloud Foundry: Pivotal Services Marketplace
○ pivotal.io/platform/services-marketplace
• Docker: Application Packages (experimental)
○ CLI tool: docker-app
Summary
Go forth and manage persistent state
Don’t let configuration mistakes take down
your production services
1. Understand your stateful application
2. Understand your orchestration system
3. Plan for the worst
1. Understand your stateful application
2. Understand your orchestration system
3. Plan for the worst
(or use a package manager)
Thank You!
For more info:
cockroachlabs.com
github.com/cockroachdb/cockroach
alex@cockroachlabs.com / @alexwritescode
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
kubernetes-stateful-containers

Contenu connexe

Tendances

Caching 101: Caching on the JVM (and beyond)
Caching 101: Caching on the JVM (and beyond)Caching 101: Caching on the JVM (and beyond)
Caching 101: Caching on the JVM (and beyond)Louis Jacomet
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
Reactive Supply To Changing Demand
Reactive Supply To Changing DemandReactive Supply To Changing Demand
Reactive Supply To Changing DemandJonas Bonér
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraJon Haddad
 
A Year with Cinder and Ceph at TWC
A Year with Cinder and Ceph at TWCA Year with Cinder and Ceph at TWC
A Year with Cinder and Ceph at TWCbryanstillwell
 
Transforming IT Infrastructure
Transforming IT InfrastructureTransforming IT Infrastructure
Transforming IT Infrastructuretim_evdbt
 
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theoryQCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theoryAysylu Greenberg
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core ConceptsJon Haddad
 
Delphix for DBAs by Jonathan Lewis
Delphix for DBAs by Jonathan LewisDelphix for DBAs by Jonathan Lewis
Delphix for DBAs by Jonathan LewisKyle Hailey
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...
All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...
All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...Ernie Souhrada
 
Jonathan Lewis explains Delphix
Jonathan Lewis explains Delphix Jonathan Lewis explains Delphix
Jonathan Lewis explains Delphix Kyle Hailey
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandraAxel Liljencrantz
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Kyle Hailey
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraDataStax
 
Cassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in ProductionCassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in ProductionDataStax Academy
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsJonas Bonér
 
NAT64/DNS64 experiments, warnings and one useful tool
NAT64/DNS64 experiments, warnings and one useful toolNAT64/DNS64 experiments, warnings and one useful tool
NAT64/DNS64 experiments, warnings and one useful toolAPNIC
 

Tendances (19)

Caching 101: Caching on the JVM (and beyond)
Caching 101: Caching on the JVM (and beyond)Caching 101: Caching on the JVM (and beyond)
Caching 101: Caching on the JVM (and beyond)
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Reactive Supply To Changing Demand
Reactive Supply To Changing DemandReactive Supply To Changing Demand
Reactive Supply To Changing Demand
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
A Year with Cinder and Ceph at TWC
A Year with Cinder and Ceph at TWCA Year with Cinder and Ceph at TWC
A Year with Cinder and Ceph at TWC
 
Transforming IT Infrastructure
Transforming IT InfrastructureTransforming IT Infrastructure
Transforming IT Infrastructure
 
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theoryQCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Delphix for DBAs by Jonathan Lewis
Delphix for DBAs by Jonathan LewisDelphix for DBAs by Jonathan Lewis
Delphix for DBAs by Jonathan Lewis
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...
All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...
All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...
 
Jonathan Lewis explains Delphix
Jonathan Lewis explains Delphix Jonathan Lewis explains Delphix
Jonathan Lewis explains Delphix
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandra
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 
Cassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in ProductionCassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in Production
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
 
NAT64/DNS64 experiments, warnings and one useful tool
NAT64/DNS64 experiments, warnings and one useful toolNAT64/DNS64 experiments, warnings and one useful tool
NAT64/DNS64 experiments, warnings and one useful tool
 

Similaire à The Highs and Lows of Stateful Containers

Database as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on KubernetesDatabase as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on KubernetesObjectRocket
 
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsDisenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsC4Media
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScyllaDB
 
Docker: Containers for Data Science
Docker: Containers for Data ScienceDocker: Containers for Data Science
Docker: Containers for Data ScienceAlessandro Adamo
 
Robust Applications in Mesos using External Storage
Robust Applications in Mesos using External StorageRobust Applications in Mesos using External Storage
Robust Applications in Mesos using External StorageDavid vonThenen
 
Cassandra Day NY 2014: From Proof of Concept to Production
Cassandra Day NY 2014: From Proof of Concept to ProductionCassandra Day NY 2014: From Proof of Concept to Production
Cassandra Day NY 2014: From Proof of Concept to ProductionDataStax Academy
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
SQL Server Clustering for Dummies
SQL Server Clustering for DummiesSQL Server Clustering for Dummies
SQL Server Clustering for DummiesMark Broadbent
 
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015Datadog
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardDocker, Inc.
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…Sergey Dzyuban
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Fwdays
 
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Continuent
 
Methods of Sharding MySQL
Methods of Sharding MySQLMethods of Sharding MySQL
Methods of Sharding MySQLLaine Campbell
 
Common Challenges in DevOps Change Management
Common Challenges in DevOps Change ManagementCommon Challenges in DevOps Change Management
Common Challenges in DevOps Change ManagementMatt Ray
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017Alex Robinson
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
Training Webinar: Enterprise application performance with distributed caching
Training Webinar: Enterprise application performance with distributed cachingTraining Webinar: Enterprise application performance with distributed caching
Training Webinar: Enterprise application performance with distributed cachingOutSystems
 

Similaire à The Highs and Lows of Stateful Containers (20)

Database as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on KubernetesDatabase as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on Kubernetes
 
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsDisenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the Database
 
Docker: Containers for Data Science
Docker: Containers for Data ScienceDocker: Containers for Data Science
Docker: Containers for Data Science
 
Robust Applications in Mesos using External Storage
Robust Applications in Mesos using External StorageRobust Applications in Mesos using External Storage
Robust Applications in Mesos using External Storage
 
Cassandra Day NY 2014: From Proof of Concept to Production
Cassandra Day NY 2014: From Proof of Concept to ProductionCassandra Day NY 2014: From Proof of Concept to Production
Cassandra Day NY 2014: From Proof of Concept to Production
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
SQL Server Clustering for Dummies
SQL Server Clustering for DummiesSQL Server Clustering for Dummies
SQL Server Clustering for Dummies
 
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 
To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…To Build My Own Cloud with Blackjack…
To Build My Own Cloud with Blackjack…
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
 
Methods of Sharding MySQL
Methods of Sharding MySQLMethods of Sharding MySQL
Methods of Sharding MySQL
 
Common Challenges in DevOps Change Management
Common Challenges in DevOps Change ManagementCommon Challenges in DevOps Change Management
Common Challenges in DevOps Change Management
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
 
Containers 101
Containers 101Containers 101
Containers 101
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Training Webinar: Enterprise application performance with distributed caching
Training Webinar: Enterprise application performance with distributed cachingTraining Webinar: Enterprise application performance with distributed caching
Training Webinar: Enterprise application performance with distributed caching
 

Plus de C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

Plus de C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Dernier

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

The Highs and Lows of Stateful Containers

  • 1. The Highs and Lows of Stateful Containers Presented by Alex Robinson / Member of the Technical Staff @alexwritescode
  • 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ kubernetes-stateful-containers
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 4.
  • 5. Almost all real applications rely on state When storage systems go down, so do the applications that use them
  • 6. Containers are new and different Change is risky
  • 7. Great care is warranted when moving stateful applications into containers
  • 9. To succeed, you must: 1. Understand your stateful application
  • 10. To succeed, you must: 1. Understand your stateful application 2. Understand your orchestration system
  • 11. To succeed, you must: 1. Understand your stateful application 2. Understand your orchestration system 3. Plan for the worst
  • 12. Let’s talk about stateful containers • Why would you even want to run stateful applications in containers? • What do stateful systems need to run reliably? • What should you know about your orchestration system? • What’s likely to go wrong and what can you do about it?
  • 13. My experience with stateful containers • Worked directly on Kubernetes and GKE from 2014-2016 ○ Part of the original team that launched GKE • Lead all container-related efforts for CockroachDB ○ Configurations for Kubernetes, DC/OS, Docker Swarm, even Cloud Foundry ○ AWS, GCP, Azure, On-Prem ○ From single availability zone deployments to multi-region ○ Help users deploy and troubleshoot their custom setups
  • 14. Why even bother? We’ve been running stateful services for decades
  • 15. Traditional management of stateful services 1. Provision one or more beefy machines with large/fast disks 2. Copy binaries and configuration onto machines 3. Run binaries with provided configuration 4. Never change anything unless absolutely necessary
  • 16. Traditional management of stateful services • Pros ○ Stable, predictable, understandable • Cons ○ Most management is manual, especially to scale or recover from hardware failures ■ And that manual intervention may not be very well practiced
  • 17. Moving to containers • Can you do the same thing with containers? ○ Sure! ○ ...But that’s not what you’ll get by default if you’re using any of the common orchestration systems
  • 18. So why move state into orchestrated containers? • The same reasons you’d move stateless applications to containers ○ Automated deployment, placement, security, scalability, availability, failure recovery, rolling upgrades ■ Less manual toil, less room for operator error ○ Resource isolation • Avoid separate workflows for stateless vs stateful applications
  • 19. Challenges of managing state “Understand your stateful application”
  • 20. What do stateful systems need?
  • 21. What do stateful systems need? • Process management • Persistent storage
  • 22. What do stateful systems need? • Process management • Persistent storage • If distributed, also: ○ Network connectivity ○ Consistent name/address ○ Peer discovery
  • 23. What do stateful systems need? • Process management • Persistent storage • If distributed, also: ○ Network connectivity ○ Consistent name/address ○ Peer discovery
  • 24. What do stateful systems need? • Process management • Persistent storage • If distributed, also: ○ Network connectivity ○ Consistent name/address ○ Peer discovery
  • 25. Managing state in plain Docker containers “Understand your orchestration system”
  • 26. Stateful applications in Docker • Not much to worry about here other than storage ○ Never store important data to a container’s filesystem
  • 27. Stateful applications in Docker 1. Data in container 2. Data on host filesystem 3. Data in network storage
  • 28. Stateful applications in Docker • Don’t: ○ docker run cockroachdb/cockroach start • Do: ○ docker run -v /mnt/data1:/data cockroachdb/cockroach start --store=/data
  • 29. Stateful applications in Docker • Don’t: ○ docker run cockroachdb/cockroach start • Do: ○ docker run -v /mnt/data1:/data cockroachdb/cockroach start --store=/data • And in most cases, you’ll actually want: ○ docker run -p 26257:26257 -p 8080:8080 -v /mnt/data1:/data cockroachdb/cockroach start --store=/data
  • 30. Stateful applications in Docker • Hardly any different from running things the traditional way • Automated - binary packaging/distribution, resource isolation • Manual - everything else
  • 31. Managing State on Kubernetes “Understand your orchestration system”
  • 32. Let’s skip over the basics • Unless you want to manually pin pods to nodes (see previous section), you should use either: ○ StatefulSet: ■ decouples replicas from nodes ■ persistent address for each replica, DNS-based peer discovery ■ network-attached storage instance associated with each replica ○ DaemonSet: ■ pin one replica to each node ■ use node’s disk(s)
  • 33. Where do things go wrong?
  • 34.
  • 35. Don’t trust the defaults! • If you don’t specifically ask for persistent storage, you won’t get any ○ Always think about and specify where your data will live
  • 36. Don’t trust the defaults! • If you don’t specifically ask for persistent storage, you won’t get any ○ Always think about and specify where your data will live 1. Data in container 2. Data on host filesystem 3. Data in network storage
  • 37. Ask for a dynamically provisioned PersistentVolume
  • 38. Don’t trust the defaults! • Now your data is persistent • But how’s performance?
  • 39. Don’t trust the defaults! • If you don’t create and request your own StorageClass, you’re probably getting slow disks ○ Default on GCE is non-SSD (pd-standard) ○ Default on Azure is non-SSD (non-managed blob storage) ○ Default on AWS is gp2, which are backed by SSDs but with fewer IOPs than io2 • This really affects database performance
  • 40. Use a custom StorageClass
  • 41. Performance problems • There are a lot of other things you have to do to get performance equivalent to what you’d get outside of Kubernetes • For more detail, see https://cockroachlabs.com/docs/kubernetes-performance.html
  • 43. What other defaults are bad? • If you: ○ Create a Kubernetes cluster with 3 nodes ○ Create a 3-replica StatefulSet running CockroachDB • What happens if one of the nodes fails?
  • 44. Don’t trust the defaults! Node 1 cockroachdb-0 cockroachdb-1 Node 2 Range 1 Node 3 Range 2 cockroachdb-2 Range 3
  • 45. Don’t trust the defaults! • If you don’t specifically ask for your StatefulSet replicas to be scheduled on different nodes, they may not be (k8s issue #41130) ○ If the node with 2 replicas dies, Cockroach will be unavailable until they come back • This is terrible for fault tolerance ○ What’s the point of running 2 database replicas on the same machine?
  • 47. What can go wrong other than bad defaults?
  • 48. What else can go wrong? • In early tests, Cockroach pods would fail to get re-created if all of them were brought down at once • Kubernetes would create the first pod, but not any others
  • 49. What else can go wrong?
  • 50. Know your app and your orchestration system • StatefulSets (by default) only create one pod at a time • They also wait for the current pod to pass readiness probes before creating the next
  • 51. Know your app and your orchestration system • StatefulSets (by default) only create one pod at a time • They also wait for the current pod to pass readiness probes before creating the next • The Cockroach health check used at the time only returned healthy if the node was connected to a majority partition of the cluster
  • 53. If just one node were to fail healthy? yes
  • 54. If just one node were to fail healthy? yes Create missing pod
  • 55. After all nodes fail healthy? no Wait for first pod to be healthy before adding second Wait for connection to rest of cluster before saying I’m healthy
  • 56. Solution to pod re-creation deadlock • Keep basic liveness probe endpoint ○ Simply checks if process can respond to any HTTP request at all • Create new readiness probe endpoint in Cockroach ○ Returns HTTP 200 if node is accepting SQL connections
  • 57. Solution to pod re-creation deadlock • Keep basic liveness probe endpoint ○ Simply checks if process can respond to any HTTP request at all • Create new readiness probe endpoint in Cockroach ○ Returns HTTP 200 if node is accepting SQL connections • Now that it’s an option, tell the StatefulSet to create all pods in parallel
  • 58. Other potential issues to look out for • Set resource requests/limits for proper isolation and to avoid evictions • No PodDisruptionBudgets by default (#35318) • If in the cloud, don’t depend on your nodes to live forever ○ Hosting services (I’m looking you, GKE) tend to just delete and recreate node VMs in order to upgrade node software ○ Be especially careful about using the nodes’ local disks because of this • If on-prem, good luck getting fast, reliable network attached storage
  • 59. Other potential issues to look out for • If you issue TLS certificates for StatefulSet DNS addresses, don’t forget to include the namespace-scoped addresses ○ “cockroachdb.default.kubernetes.svc.local” vs just “cockroachdb” ○ Needed for cross-namespace communication ○ Also don’t put pod IPs in node certs - it’ll work initially, but not after pod re-creation • Multi-region stateful systems are really tough to make work ○ Both network connectivity and persistent addresses are hard to set up ○ Hopefully you went to yesterday’s Cilium and Istio talks
  • 60. How to get started Isn’t this all a lot of work?
  • 61. Gettings things right is far from easy
  • 62. What should you do if you aren’t an expert on the systems you want to use?
  • 63. How to get started • You could take the time to build expertise
  • 64. How to get started • You could take the time to build expertise • But ideally someone has already done the hard work for you
  • 65. Off-the-shelf configurations • There are great configurations available for popular OSS projects • They’ve usually been made by someone who knows that project well • They’ve often already been proven in production by other users
  • 66. Off-the-shelf configurations • Kubernetes off-the-shelf configs are unfortunately quite limited ○ YAML forces the config writer to make decisions that would best be left to the user ○ No built-in method for parameter substitution • How could a config writer possibly know your desired: ○ StorageClass ○ Disk size ○ CPU and memory requests/limits ○ Application-specific configuration options ○ etc.
  • 67. Enter: package managers • Additional formats have been defined to make parameterizing easier • Package creator defines set of parameters that can be easily overriden • User doesn’t have to understand or muck with YAML files ○ Just look through list of parameters and pick which need customizing
  • 68.
  • 69.
  • 70.
  • 71. Orchestrator package managers • Kubernetes: Helm ○ helm.sh ○ github.com/helm/charts/ • DC/OS: Universe ○ universe.dcos.io • Cloud Foundry: Pivotal Services Marketplace ○ pivotal.io/platform/services-marketplace • Docker: Application Packages (experimental) ○ CLI tool: docker-app
  • 72. Summary Go forth and manage persistent state
  • 73. Don’t let configuration mistakes take down your production services
  • 74. 1. Understand your stateful application 2. Understand your orchestration system 3. Plan for the worst
  • 75. 1. Understand your stateful application 2. Understand your orchestration system 3. Plan for the worst (or use a package manager)
  • 76. Thank You! For more info: cockroachlabs.com github.com/cockroachdb/cockroach alex@cockroachlabs.com / @alexwritescode
  • 77. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ kubernetes-stateful-containers