Citus scales out PostgreSQL through using the extension APIs. To do this, Citus shards and replicates data, performs distributed deadlock detection, and parallelizes queries across a cluster of machines.
This talk describes the distributed systems challenges we faced at Citus and how we addressed them. In particular, we'll talk about three problems we tackled when scaling out Postgres:
- Three strategies to high availability and disaster recovery in Postgres
- Using Postgres' extension framework to route and parallelize queries across a cluster of machines
- The relationship between distributed consistency and locks; and how to resolve distributed deadlocks
Finally, we'll talk about open challenges around scaling out Postgres. We'll then conclude the talk with Q&A.
3. I love Postgres, too
3 Ozgun Erdogan | QCon San Francisco 2017
Ozgun Erdogan
CTO of Citus Data
Distributed Systems
Distributed Databases
Formerly of Amazon
Love drinking margaritas
5. Our mission at Citus Data
5 Ozgun Erdogan | QCon San Francisco 2017
Make it so SaaS businesses
never have to worry about
scaling their database again
6. What is the Citus database?
1.Scales out PostgreSQL
2.Extension to PostgreSQL
3.Available in 3 Ways
Ozgun Erdogan | QCon San Francisco 2017
• Using sharding & replication
• Query engine parallelizes SQL queries across many nodes
• Using PostgreSQL extension APIs
7. Citus, Packaged Three Ways
Ozgun Erdogan | QCon San Francisco 2017
Open
Source
Enterprise
Software
Fully-Managed
Database as a Service
github.com/citusdata/citus
9. 3 Challenges Distributing Postgres
1. PostgreSQL and High Availability
2. PostgreSQL is huge. How to keep up with it
3. Distributed transactions
Ozgun Erdogan | QCon San Francisco 2017
11. Why is High Availability hard?
PostgreSQL replication uses one primary &
multiple secondary nodes. Two challenges:
1. Most Postgres clients aren’t smart. When the
primary fails, they retry the same IP.
2. Postgres replicates entire state. This makes it
resource intensive to reconstruct new nodes from a
primary.
Ozgun Erdogan | QCon San Francisco 2017
13. Database Failures Shouldn’t Be a Big Deal
1. PostgreSQL streaming replication to replicate from
primary to secondary. Back up to S3.
2. Volume level replication to replicate to secondary’s
volume. Back up to S3.
3. Incremental backups to S3. Reconstruct secondary
nodes from S3.
Ozgun Erdogan | QCon San Francisco 2017
3 Methods for HA & Backups in Postgres
14. Postgres - Streaming Replication (1)
Write-ahead logs
(streaming repl.)
Table foo
Primary –
PostgreSQL
streaming repl.
Table bar
WAL logs
Table foo
Table bar
WAL logs
Secondary –
PostgreSQL
streaming repl.
Monitoring Agents -
streaming repl.
setup & auto failover
S3 / Blob Storage
(Encrypted)
Backup
Process
Ozgun Erdogan | QCon San Francisco 2017
15. Postgres – AWS RDS & Azure (2)
Postgres
Primary
Monitoring Agents
(Auto node failover)
Persistent Volume
Postgres
Standby
S3 / Blob Storage
(Encrypted)
Table foo
Table bar
WAL logs
Table foo
Table bar
WAL logs
Backup process
Backup
Process
Persistent Volume
Ozgun Erdogan | QCon San Francisco 2017
16. Postgres – Reconstruct from WAL (3)
Postgres
Primary
Monitoring Agents
(Auto node failover)
Persistent Volume
Postgres
Secondary
Backup
Process
S3 / Blob Storage
(Encrypted)
Table foo
Table bar
WAL logs
Persistent Volume
Table foo
Table bar
WAL logs
Backup process
Ozgun Erdogan | QCon San Francisco 2017
17. WHO DOES THIS? PRIMARY BENEFITS
Streaming Replication
(local / ephemeral disk)
On-prem
Manual EC2
Simple to set up
Direct I/O: High I/O & large storage
Disk Mirroring
RDS
Azure Preview
Works for MySQL and PostgreSQL
Data durability in cloud environments
Reconstruct from WAL
Heroku
Citus Cloud
Enables Fork and PITR
Node reconstruction in background
(Data durability in cloud environments)
How do these approaches compare?
17 Ozgun Erdogan | QCon San Francisco 2017
18. Summary
• In PostgreSQL, a database node’s state gets
replicated in its entirety. The replication can be set up
in three ways.
• Reconstructing a secondary node from S3 makes
bringing up or shooting down nodes easy.
• When you shard your database, the state you need to
replicate per node becomes smaller.
Ozgun Erdogan | QCon San Francisco 2017
20. 3 ways to build a distributed database
1. Build a distributed database from scratch
2. Middleware sharding (mimic the parser)
3. Fork your favorite database (like PostgreSQL)
Ozgun Erdogan | QCon San Francisco 2017
22. Postgres Features, Tools & Frameworks
• PostgreSQL manual (US Letter)
• Clients for diff programming
languages
• ORMs, libraries, GUIs
• Tools (dump, restore, analyze)
• New features
Ozgun Erdogan | QCon San Francisco 2017
23. At First, Forked PostgreSQL with Style
Ozgun Erdogan | QCon San Francisco 2017
24. Two Stage Query Optimization
1. Plan to minimize network I/O
2. Nodes talk to each other using SQL over libpq
3. Learned to cooperate with planner / executor bit by bit
(Volcano style executor)
Ozgun Erdogan | QCon San Francisco 2017
25. Citus Architecture (Simplified)
25
SELECT avg(revenue)
FROM sales
Coordinator
SELECT sum(revenue), count(revenue)
FROM table_1001
SELECT sum … FROM table_1003
Worker node 1
Table metadata
Table_1001
Table_1003
SELECT sum … FROM table_1002
SELECT sum … FROM table_1004
Worker node 2
Table_1002
Table_1004
Worker node N
.
.
.
.
.
.
Each node PostgreSQL with Citus installed
1 shard = 1 PostgreSQL table
Ozgun Erdogan | QCon San Francisco 2017
26. Unfork Citus using Extension APIs
CREATE EXTENSION citus;
• System catalogs – Distributed metadata
• Planner hook – Insert, Update, Delete, Select
• Executor hook – Insert, Update, Delete, Select
• Utility hook – Alter Table, Create Index, Vacuum, etc.
• Transaction & resources handling – file descriptors, etc.
• Background worker process – Maintenance processes
(distributed deadlock detection, task tracker, etc.)
• Logical decoding – Online data migrations
Ozgun Erdogan | QCon San Francisco 2017
29. Consistency in Distributed Databases
1. 2PC: All participating nodes need to be up
2. Paxos: Achieves consensus with quorum
3. Raft: More understandable alternative to
Paxos
Ozgun Erdogan | QCon San Francisco 2017
32. What is a Lock?
• Protects against concurrent modifications.
• Locks are released at the end of a transaction.
Deadlocks
33. Transactions Block on 1st Conflicting LockWhat is a lock?
Protects against concurrent modifications
Locks released at end of transaction
BEGIN;
UPDATE data SET y = 2 WHERE x = 1;
<obtained lock on rows with x = 1>
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = 5 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
34. Transactions and Concurrency
• Transactions that don’t modify the same row
can run concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
35. Transactions and Concurrency
• Transactions that don’t modify the same row
can run concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?
36. Transactions and Concurrency
• Transactions that don’t modify the same row
can run concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
37. Transactions and Concurrency
• Transactions that don’t modify the same row
can run concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone
38. Transactions and Concurrency
• Transactions that don’t modify the same row
can run concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
39. Transactions and Concurrency
• Transactions that don’t modify the same row
can run concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
PostgreSQL’s deadlock detector still works
40. Transactions and Concurrency
• Transactions that don’t modify the same row
can run concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
PostgreSQL’s deadlock detector still works
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
41. Transactions and Concurrency
• Transactions that don’t modify the same row
can run concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
PostgreSQL’s deadlock detector still works
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlock detection in Citus 7
Citus 7 adds distributed deadlock detection
42. Transactions and Concurrency
• Transactions that don’t modify the same row
can run concurrently.
Transactions block on 1st lock that conflicts
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
COMMIT;
<all locks released>
BEGIN;
UPDATE data SET y = y + 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
<waiting for lock on rows with x = 1>
<obtained lock on rows with x = 1>
COMMIT;
(Distributed) deadlock!
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 1;
UPDATE data SET y = y + 1 WHERE x = 2;
BEGIN;
UPDATE data SET y = y - 1 WHERE x = 2;
UPDATE data SET y = y + 1 WHERE x = 1;
But what if they start blocking each other?Deadlock detection in PostgreSQL
Deadlock detection builds a graph of processes that
are waiting for each other.
Deadlock detection in PostgreSQL
Transactions are cancelled until the cycle is gone
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
Citus delegates transactions to nodes
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
PostgreSQL’s deadlock detector still works
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlocks in Citus
When deadlocks span across node, PostgreSQL cannot help us
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlock detection in Citus 7
Citus 7 adds distributed deadlock detection
Firstname Lastname | Citus Data | Meeting Name | Month Year
Deadlock detection in Citus 7
Citus 7 adds distributed deadlock detection.
43. Distributed transactions are a complex
topic
• Most articles on distributed transactions focus on data
consistency.
• Data consistency is only one side of the coin. If you’re
using a relational database, your application benefits
from another key feature: deadlock detection.
• https://www.citusdata.com/blog/2017/08/31/databases
-and-distributed-deadlocks-a-faq
Ozgun Erdogan | QCon San Francisco 2017
44. So now what? We talked about 3
challenges distributing Postgres
1. PostgreSQL, Replication, High Availability
2. Tradeoffs in building a distributed database—
and how we chose PostgreSQL s extension
APIs
3. Distributed deadlock detection & distributed
transactions
Ozgun Erdogan | QCon San Francisco 2017