Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)

Spilo,
highly-available
PostgreSQL cluster
Oleksii Kliukin
Zalando SE

Zalando
• 15 EU countries
• 3 fulfilment 
centers
• 15+ million 
active customers
• 2.2 billion € 
revenue 2014

Our databases
• >150 production Postgresql
databases
• >13.5 TB data
• >5 TB biggest DB
• 400-1000+ write tps
• >2 DB failures/month

Infrastructure bottleneck
ACID Team
create
alter
deploy
migrate
failover
upgrade
80+ teams

Cloud
• 2013: ZCloud 
• 2014: project Pequod 
• 2015: Let’s just use AWS…

Amazon 3-letter words
• AWS - amazon web services
• EC2 - elastic compute cloud
• ELB - elastic load balancer
• RDS - relational DB service

AWS
• One account per team
• Microservices
• REST/OAuth2
• Deployment with Docker

Autonomous teams on AWS
REST
INTERNET

Autonomous teams
• Team decides which product to
build
• … and which technologies to use 
• REST/OAuth2 mandatory 
• Team is responsible for its
infrastructure

Databases?
• Developers should take care
of infrastructure 
• ..including production
databases 
• On AWS!

Isn’t it dangerous?
DBAs running with scissors, by Gavin M. Roy: https://www.flickr.com/photos/gavinmroy/4638958958

ACID team
provides
PostgreSQL
trainings

Autofailover tasks
• Detect the master failure
• Elect a new master
• Redirect clients

Autofailover issues
• Discarded writes
• Split-brain
• False positives

RDS?
• Support for PostgreSQL
• Automatic failover
• Most extensions
• Automatic backups

RDS?
• Vendor lock
• No superuser
• No untrusted languages
• No logical decoding plugins
• Rather expensive

EC2 + Linux HA
• Complex setup
• Lots of manual steps 
(i.e. new replica creation)

Spilo does
• Rapid deployment of
PostgreSQL on AWS EC2
instances 
• Streaming replication with
auto-failover

Spilo on AWS
Spilo
MASTER
Spilo
REPLICA
Spilo
REPLICA
Master connection
Application DB request
ETCD cluster status
update

Failover
Spilo
REPLICA
Spilo
REPLICA
Master connection
ETCD cluster status
update

Failover
Spilo
MASTER
Spilo
REPLICA
Master connection
ETCD cluster status
update
NEW
SPILO
STARTS…

Failover
Spilo
MASTER
Spilo
REPLICA
Master connection
ETCD cluster status
update
Spilo
REPLICA

What is Spilo?
c
Patroni
MASTER
c
Patroni
REPLICA
c
Patroni
REPLICA
Auto-scaling group Auto-scaling group

Patroni ("&'(%)#)
• Handles new replicas and
failover 
• Based on ideas and code of
the Compose Governor 
• Open-source

Compose Governor idea
 
Core to our PostgreSQL HA
system is the Governor application
which uses etcd as its repository of
truth to discover which database
instance is leader.

Distributed configuration systems
• Fault tolerant 
• Reliably store small amounts of
strongly-consistent data
between distributed nodes
• Good for storing the PostgreSQL
cluster state

Distributed consensus
LEADER
CLIENT CLIENT CLIENT

Cluster state in etcd
$ etcdctl ls --recursive
/service
/service/batman
/service/batman/optime
/service/batman/optime/leader
/service/batman/members
/service/batman/members/postgresql0
/service/batman/members/postgresql1
/service/batman/initialize
/service/batman/leader

Leader key
$ etcdctl get /service/batman/leader
postgresql0
• Points to the member key
• Has a TTL, autoexpires
• Acts as an exclusive lock
• Only the leader can become
the master

Leader TTL
$ http http://127.0.0.1:2379/v2/keys/service/batman/
leader
…
{
"action": "get",
"node": {
"createdIndex": 48723,
"expiration": "2015-10-23T14:51:49.686506977Z",
"key": "/service/batman/leader",
"modifiedIndex": 49037,
"ttl": 27,
"value": "postgresql0"
}
}

Member key
$ etcdctl get /service/batman/members/
postgresql0
{“role":"master",
“state”:"running",
“conn_url”:"postgres://replicator:rep-
pass@127.0.0.1:5432/postgres",
“api_url”:"http://127.0.0.1:8008/
patroni",
"xlog_location":67108960}

Connection and API URL
c
Patroni
c
Patroni
API URL 
(check health 
during promotion)
MASTER
REPLICA
CONNECTION URL
MASTER LB
REPLICA LB
CONNECTION URL

Initialize key
$ etcdctl get /service/batman/initialize
6208852353820383446
• PostgreSQL cluster system ID
• Created by the first node that
joins the cluster
• Nodes with different system
ID are not allowed to join

Patroni modules
ETCD ZOOKEEPER
ABSTRACT DCS PostgreSQL REST API
High availability
Asynchronous executor
Callbacks

Demo time!
https://asciinema.org/a/29087

From Governor to Patroni
Governor
Patroni

Location of etcd: original
c
Governor
c
Governor
c
Governor

Replace etcd with proxy
c
Governor
c
Governor
c
Governor
Proxy
Proxy
Proxy

Embed etcd client in Patroni
c
Patroni
c
Patroni
c
Patroni

Patroni improvements
• Robust exception handling
• Run long-running tasks (i.e.
base backup in a separate
thread)
• ETCD + Zookeeper
• Rest API

• Configurable replica imaging
• Support for pg_rewind

• Manual failover
• Initialize from external
cluster
• Attach to already running
PostgreSQL nodes
• Tags (i.e. nofailover)

What you should monitor
• replication lag
• unhealthy member
• no leader
• etcd/ 
Zookeeper

Thank you!
• Spilo: 
github.com/zalando/spilo 
spilo.readthedocs.org 
• Patroni: 
github.com/zalando/patroni 
patroni.readthedocs.org 
• Feedback: @alexeyklyukin

Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)

Similaire à Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE) (20)

Plus de Ontico

Plus de Ontico (20)

Dernier

Dernier (20)

Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)