Spilo is a tool that provides high availability for PostgreSQL databases running on AWS. It uses Patroni and ETCD to handle replication, failover, and cluster state management. Teams at Zalando use Spilo to run over 150 PostgreSQL databases in a self-managed way on AWS, with each team responsible for their own databases. Spilo provides automation for deploying, replicating, and failing over PostgreSQL clusters on AWS, allowing for increased agility compared to managed database services.
17. Autonomous teams
• Team decides which product to build
• … and which technologies to use
• REST/OAuth2 mandatory
• Team is responsible for its
infrastructure
18. • Developers should take care of
infrastructure
• ..including production
databases
• On AWS!
Databases?
19. Isn’t it dangerous?
DBAs running with scissors, by Gavin M. Roy: https://www.flickr.com/photos/gavinmroy/4638958958
33. Demo: Deploying Spilo
• We use stups
• First we define a template
• We create a Cloud Formation
Stack from this template
34. Patroni (პატრონი)
• Handles new replicas and failover
• Based on ideas and code of the
Compose Governor
• Open-source
• Runs everywhere
35. Compose Governor idea
● Use etcd for failover decision
● Run etcd on every node
● Run 1 node with HAProxy + etcd
36. Distributed configuration systems
• Fault tolerant
• Reliably store small amounts of
strongly-consistent data between
distributed nodes
• Good for storing the PostgreSQL
cluster state
47. Connection and API URL
Patroni
Patroni
MASTER REPLICA
MASTER LB REPLICA LB
API HealthCheck (slave)
PostgreSQL connection
API HealthCheck (master)
48. Initialize key
$ etcdctl -C $ETCD get /service/dm/initialize
6219169399948550171
• PostgreSQL cluster system ID
• Created by the first node that joins
the cluster
• Nodes with different system ID are
not allowed to join