Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Spilothe high-available PostgreSQL cluster
Feike Steenbergen
Zalando SE

• 15 EU countries
• 3 fulfilment
centers
• 15+ million
active customers
• 2.2 billion €
revenue 2014
Zalando

Our databases
• > 150 production Postgresql
databases
• > 13.5 TB data
• > 5 TB biggest DB
• 400-1000+ write tps
• > 2 DB failures/month

Infrastructure bottleneck
ACID Team
create
alter
deploy
migrate
failover
upgrade
80+ teams

Cloud
• 2013: ZCloud
• 2014: project Pequod
• 2015: Let’s just use AWS…

Amazon abbreviations
• AWS - amazon web services
• EC2 - elastic compute cloud
• ELB - elastic load balancer
• RDS - relational DB service
• CF - Cloud Formation
• ASG - Auto Scaling Group

AWS
• One account per team
• Microservices
• REST/OAuth2
• Deployment with Docker

Autonomous teams on AWS
REST
INTERNET

Autonomous teams
• Team decides which product to build
• … and which technologies to use
• REST/OAuth2 mandatory
• Team is responsible for its
infrastructure

• Developers should take care of
infrastructure
• ..including production
databases
• On AWS!
Databases?

Isn’t it dangerous?
DBAs running with scissors, by Gavin M. Roy: https://www.flickr.com/photos/gavinmroy/4638958958

ACID team
provides
PostgreSQL
trainings

• Detect the master failure
• Elect a new master
• Redirect clients
Autofailover tasks

Autofailover issues
• Discarded writes
• Split-brain
• False positives

RDS?
• Support for PostgreSQL
• Automatic failover
• Most extensions
• Automatic backups

RDS?
• Vendor lock
• No superuser
• No untrusted languages
• No logical decoding plugins
• Costs

Spilo does
• Rapid deployment of
PostgreSQL on AWS EC2
instances
• Streaming replication with auto-
failover

Spilo on AWS
Spilo
MASTER
Spilo
REPLICA
Spilo
REPLICA
Master connection
Application DB request
ETCD cluster status
update

Failover
Spilo
REPLICA
Spilo
REPLICA
Master connection
ETCD cluster status
update

Failover
Spilo
MASTER
Spilo
REPLICA
Master connection
ETCD cluster status
update
NEW
SPILO
STARTS…

Failover
Spilo
MASTER
Spilo
REPLICA
Master connection
ETCD cluster status
update
Spilo
REPLICA

What is Spilo?
c
Patroni
MASTER
c
Patroni
REPLICA
c
Patroni
REPLICA
Auto-scaling group Auto-scaling group

Demo: Deploying Spilo
• We use stups
• First we define a template
• We create a Cloud Formation
Stack from this template

Patroni (პატრონი)
• Handles new replicas and failover
• Based on ideas and code of the
Compose Governor
• Open-source
• Runs everywhere

Compose Governor idea
● Use etcd for failover decision
● Run etcd on every node
● Run 1 node with HAProxy + etcd

Distributed configuration systems
• Fault tolerant
• Reliably store small amounts of
strongly-consistent data between
distributed nodes
• Good for storing the PostgreSQL
cluster state

Distributed consensus
LEADER
CLIENT CLIENT CLIENT
write request

Distributed consensus
LEADER
CLIENT CLIENT CLIENT
write request
LEADER

Cluster state in etcd
$ export ETCD=172.17.0.2:4001
$ etcdctl -C $ETCD ls /service/ --recursive
/service/dm
/service/dm/optime
/service/dm/optime/leader
/service/dm/members
/service/dm/members/postgresql_172_17_0_3
/service/dm/initialize
/service/dm/leader

Leader key
• Points to the member key
• Has a TTL, autoexpires
• Acts as an exclusive lock
• Only the leader can become the
master

Leader TTL
$ http http://$ETCD/v2/keys/service/dm/leader
...
{
"action": "get",
"node": {
"createdIndex": 8,
"expiration": "2015-11-20T09:56:43.59367
"key": "/service/dm/leader",
"modifiedIndex": 85,
"ttl": 22,
"value": "postgresql_172_17_0_3"
}
}

Member key
$ etcdctl -C $ETCD get /service/dm/members/postgresql_172_17_0_5
{
"conn_url": "postgres://un:pw@172.17.0.5:5432/po
"api_url": "http://172.17.0.5:8008/patroni",
"tags": {},
"state": "running",
"role": "replica",
"xlog_location": 67109176
}

Patroni
Patroni
MASTER REPLICA
MASTER LB
PostgreSQL connection
API HealthCheck (master)
Connection and API URL

Connection and API URL
Patroni
Patroni
MASTER REPLICA
MASTER LB REPLICA LB
API HealthCheck (slave)
PostgreSQL connection
API HealthCheck (master)

Initialize key
$ etcdctl -C $ETCD get /service/dm/initialize
6219169399948550171
• PostgreSQL cluster system ID
• Created by the first node that joins
the cluster
• Nodes with different system ID are
not allowed to join

Patroni modules
ETCD ZOOKEEPER
ABSTRACT DCS PostgreSQL REST API
High availability
Asynchronous executor
Callbacks

Demo time!
https://asciinema.org/a/2ttvu50yehjo2712s1w43udio

• Robust exception handling
• Run long-running tasks (i.e.
base backup in a separate
thread)
• ETCD + Zookeeper
• Rest API
Patroni improvements

• Configurable replica imaging
• Support for pg_rewind
• patronictl
• packaged:
pip install patroni

• Manual failover
• Initialize from external cluster
• Attach to already running
PostgreSQL nodes
• Tags (i.e. nofailover)

• Spilo:
github.com/zalando/spilo
spilo.readthedocs.org
• Patroni:
github.com/zalando/patroni
patroni.readthedocs.org
• Stups:
github.com/zalando-stups/
stups.io
• Feedback: @ekief
Thank you!

Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (7)

Similaire à Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Similaire à Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen (20)

Plus de distributed matters

Plus de distributed matters (17)

Dernier

Dernier (20)

Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen