PostgreSQL Scaling And Failover

PostgreSQL
High Availability & Scaling

John Paulett
October 26, 2009

Overview
Scaling Overview
– Horizontal & Vertical Options
High Availability Overview
Other Options
Suggested Architecture
Hardware Discussion

10/26/2009 2

What are we trying to solve?
Survive server failure?
– Support an uptime SLA (e.g. 99.9999%)?

Application scaling?
– Support additional application demand

10/26/2009 3

What are we trying to solve?
Survive server failure?
– Support an uptime SLA (e.g. 99.9999%)?

Application scaling?
– Support additional application demand

→ Many options, each optimized for
different constraints

10/26/2009 4

Scaling Overview

10/26/2009 5

How To Scale
Horizontal Scaling
– “Google” approach
– Distribute load across multiple servers
– Requires appropriate application architecture

Vertical Scaling
– “Big Iron” approach
– Single, massive machine (lots of fast processors,
RAM, & hard drives)

10/26/2009 6

Horizontal DB Scaling
Load Balancing
– Distribute operations to multiple servers

Partitioning
– Cut up the data (horizontal) or tables (vertical)
and put them on separate servers
– aka “sharding”

10/26/2009 7

Basic Problem when Load
Balancing
Difficult to maintain consistent state
between servers (remember ACID),
especially when dealing with writes

4 PostgreSQL Load Balancing Methods:
– Master-Slave Replication
– Statement-Based Replication Middleware
– Asynchronous Multimaster Replication
– Synchronous Multimaster Replication

10/26/2009 8

Master-Slave Replication
Master handles writes, slaves handle reads

Asynchronous replication
– Possible data loss on master failure

Slony-I
– Does not automatically propagate schema changes
– Does not offer single connection point
– Requires separate solution for master failures

10/26/2009 9

Statement-Based Replication
Middleware
Intercept SQL queries, send writes to all
servers, reads to any server

Possible issues using random(),
CURRENT_TIMESTAMP, & sequences

pgpool-II
– Connection Pooling, Replication, Load Balancing,
Parallel Queries, Failover

10/26/2009 10

pgpool-II

10/26/2009 11

Synchronous Multimaster
Replication
Writes & reads on any server

Not implemented in PostgreSQL, but
application code can mimic via two-phase
commit

10/26/2009 12

Load Balancing Issue
Scaling writes breaks down at a certain
point

10/26/2009 13

Partitioning
Requires heavy application modification

Performing queries across partitions is
problematic (not possible)

PL/Proxy can help

10/26/2009 14

Vertical DB Scaling
“Buying a bigger box is quick(ish). Redesigning
software is not.”
● Cal Henderson, Flickr

37 Signals Basecamp upgraded to 128 GB DB
server: “don’t need to pay the complexity tax
yet”
● David Heinemeier Hansson, Ruby on Rails

10/26/2009 15

Sites Running on Single DB
StackOverflow
– MS SQL, 48GB RAM, RAID 1 OS, RAID 10 for data

37Signals Basecamp
– MySQL, 128GB RAM. Dell R710 or Dell 2950

10/26/2009 16

High Availability Overview

10/26/2009 17

High Availability
Application still up even after node failure
– (Also try to prevent failure with appropriate
hardware)

PostgreSQL High Availability Options
– pg-pool
– Shared Disk Failover
– File System Replication
– Warm Standby with Point-In-Time Recovery (PITR)
Often still need heartbeat application

10/26/2009 18

Shared Disk Failover
Use single disk array to hold database's
data files.
– Network Attached Storage (NAS)
– Network File System (NFS)

Disk array is central point of failure

Need heartbeat to bring 2nd server online

10/26/2009 19

File System Replication
File system is mirrored to another
computer

DRDB
– Linux filesystem replication

Need heartbeat to bring 2nd server online

10/26/2009 20

Point in Time Recovery
“Log shipping”
– Write Ahead Logs sent to and replayed on standby
– Included in PostgreSQL 8.0+
– Asynchronous - Potential loss of data

Warm Standby
– Standbys' hardware very similar to primary's
– Need heartbeat to bring 2nd server online

10/26/2009 21

Heartbeat
“STONITH” (Shoot the Other Node In The
Head)
– Prevent multiple nodes thinking they are the
master

Linux-HA
– Creates cluster, takes nodes out when they fail

10/26/2009 22

Additional Options

10/26/2009 23

Additional Options
Tune PostgreSQL
– Defaults designed to “run anywhere”
– pgbench, VACUUM/ANALYZE

Tune Queries
– EXPLAIN

Caching (avoid the database)
– memcached
– Ehcache

10/26/2009 24

Radical Additional Options
“NoSQL database
”
– CouchDB, MongoDB, HBase, Cassandra, Redis
– Document store
– Map/Reduce querying

10/26/2009 25


10/26/2009 26

Current Production Setup
DB and Web server on same machine
No failover

10/26/2009 27

2 nice machines
Point in Time Recovery with Heartbeat
Tune PostgreSQL
Monitor & improve slow queries
Add in Ehcache as we touch code

→ Leave horizontal scaling for another day

10/26/2009 28

Initial Architecture
High Availability

10/26/2009 29

Future Architecture
Scale up application servers horizontally
as needed
Improve DB Hardware

10/26/2009 30

Hardware Options
PostgreSQL typically constrained by RAM
& Disk IO, not processor

64-bit, as much memory as possible

Data Array
– RAID10 with 4 drives (not RAID 5), 15k RPM
Separate OS Drive / Array

10/26/2009 31

Dell R710
Processor: Xeon
4x 15k HD in RAID10
24GB (3x 8GB) RAM (up to 6x 16GB)
=$6,905

10/26/2009 32

Other Considerations
Should have Test environment mimic
Production
– Same database setup
– Provides environment for experimentation

Can host multiple DBs on single cluster

10/26/2009 33

References
http://37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-sharding
http://37signals.com/svn/posts/1819-basecamp-now-with-more-vroom
http://anchor.com.au/hosting/dedicated/Tuning_PostgreSQL_on_your_Dedicated_S
erver
http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-log-
shipping-with-pg-standby.html
http://blog.stackoverflow.com/2009/01/new-stack-overflow-servers-ready/
http://developer.postgresql.org/pgdocs/postgres/high-availability.html
http://developer.postgresql.org/pgdocs/postgres/pgbench.html
https://developer.skype.com/SkypeGarage/DbProjects/PlProxy
http://wiki.postgresql.org/wiki/Performance_Optimization
http://www.postgresql.org/docs/8.4/static/warm-standby.html
http://www.postgresql.org/files/documentation/books/aw_pgsql/hw_performance/
http://www.slony.info/

10/26/2009 34

Additional Links
http://ehcache.org/
http://highscalability.com/skype-plans-postgresql-scale-1-billion-
users
http://www.25hoursaday.com/weblog/2009/01/16/BuildingScalable
DatabasesProsAndConsOfVariousDatabaseShardingSchemes.aspx
http://www.danga.com/memcached/
http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-
want-to-shard/
http://www.slideshare.net/iamcal/scalable-web-architectures-
common-patterns-and-approaches-web-20-expo-nyc-presentation

10/26/2009 35

PostgreSQL Scaling And Failover

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à PostgreSQL Scaling And Failover

Similaire à PostgreSQL Scaling And Failover (20)

Plus de John Paulett

Plus de John Paulett (8)

Dernier

Dernier (20)

PostgreSQL Scaling And Failover