This presentation will talk about how Alibaba Cloud deals with the management of PostgreSQL as a Cloud Service. It will not only talk about the architecture design of the management system, link access architecture and high availablity, but also the enhancements we've done for PostgreSQL, the way we play with open source and the coming future based on our customs' feedback.
6. HA
• Transparent
• Replication
Lag limit
After Created
Monitoring
• Performance
• Exception and
Alarm
• Auto-Repair
Backup And
Restore
• Backup
Scheduling
• Time Specified
Restore
PostgreSQL
• OOM Handling
• IO Hung
• Privilege
Management
• Transparent
Switch-Over
• Transparent
Connection Pool
7. High Availability
Meta Data
of
instances
HA Worker
PG
Master
PG Slave
HA Worker
…
HA Worker
HA Worker
…
PG
Master
PG Slave
PG
Master
PG Slave
replication
HA Master
HA Master
10. OOM Handling
• Standalone Handler
• Signal Process
IO Hung
• ext4
• IO sync
Privilege Management
• No super-user
• Normal user is not enough
PostgreSQL Kernel Enhancement
Transparent Connection
• Connection Pool
• Connection Switch
PG
11. OOM Handling
OOM Handler PG Process 1
Instance Resource Container
Public Resource Container
OOM Signal
PG Process N…
Most Memory
Usage
PG Process 2PG Process 2
12. OOM Handling
OOM Handler PG Process 1
Instance Resource Container
PG Process 2
Public Resource Container
PG Process N…
PG Process 2
Received cancel and
report OOM to frontend.
PG Process 2
PG Process 2
14. Privilege Management
• Super-user is dangerous
• User need lots of privileges
• Compatibility
rolname sanity
rolsuper f
rolinherit t
rolcreaterole t
rolcreatedb t
rolcatupdate t
rolcanlogin t
rolreplication t
rolconnlimit -1
rolpassword ****
rolvaliduntil
15. OOM Handling
• Resource Limit
• Process Container Control
Privilege Management
• No super-user
• Normal user is not enough
PostgreSQL Kernel
PG
16. IO Hung
ext4 journal
lock write
Data Pages
Could be a lot, and IO consuming
The problem of fsync during checkpoint
BLOCK File System
write
unlock
More Page Write
IO limit
GET WORSE
18. OOM Handling
• Resource Limit
• Process Container Control
IO Hung
• ext4
• IO sync
Privilege Management
• No super-user
• Normal user is not enough
IO Hung
PG