The presentation "Mastering Aurora PostgreSQL Clusters for Disaster Recovery" by Bhuvanesh, Co-Founder & CTO of ShellKode, at the Mydbops OpenSource Database Meetup 14 covers advanced topics in managing Aurora PostgreSQL clusters for disaster recovery purposes.
Bhuvanesh discusses key features of Aurora, such as its decoupled storage and compute layers, auto scaling capabilities, and native replication, highlighting its benefits over traditional RDS instances. He also explores Aurora Global Databases, explaining how they enable replication of data across regions for geo-span applications with low latency.
The presentation includes architecture details, such as physical and log replication, and managed failover options for ensuring high availability. Bhuvanesh shares real-world experiences and best practices for managing Aurora clusters, including handling replication lag and TLS certificate management.
2. About Me
Co-Founder & CTO
bhuvanesh@shellkode.com
A data guy by Job but a
DBA by nature
Network Engineer
Cloud Architect
Database Administrator
Data Engineer
Data Architect
>_
@BhuviTheDataGuy
@BhuviTheDataGuy
https://TheDataGuy.in
/in/rbhuvanesh
@BhuviTheDataGuy
Social Media Handles
3. About ShellKode
We are a born in cloud company specializing in Modernization, Security, Data, and
AI/ML to empower businesses with cutting-edge technologies and drive transformative
growth.
Bengaluru
Achievements
One of the fastest
growing AWS partner
Public Sector
Badge
Well Architected
Program
50+
Happy Customers
55+
AWS Certified Architects
4
Service Delivery Centers
Coimbatore Hyderabad Florida
AI/ML
Chatbot
Decision Making AI
Recommendation
Engine
Modernisation
Migration Containerise DevOps
Data
Data Engineering Data Analytics DataOps
GenAI
Multi Model
Large language
Model
Foundational
Model
Security
Managed
Services
Services
4. Aurora – The differentiator
• Storage and Compute layers are decoupled
and scale independently
• Data will be maintained 2 copies/Zone and
6copies/region
• Auto scale with 10GB chunks
• Aurora native replication
• Auto scale the read replica
• Provision the replica in a few mins
• High throughput comparing with RDS
native instances
Features
5. Aurora Global Databases
• Replicate your data to global
• Best fit for geo span applications
• Fully Managed Failover
• Guaranteed RPO
• Low latency replication
• Failover to any region at anytime
• Supports global write forwarding
6. Architecture
• Physical + Log Replication
• Asynchronous replication
• <1 sec replication lag
• Custom replication service
• Powered by AWS backbone networks
• Encrypted connections
• Supports up to 5 secondary regions
8. Managed Failover
Switchover
Formerly known as "managed planned failover," this
method is ideal for controlled situations like operational
maintenance and other planned operational processes.
By ensuring that secondary DB clusters are synchronized
with the primary before implementing any further
alterations, it guarantees an RPO of 0 (no data loss).
Failover
Utilize this method for addressing unforeseen outages. By
executing a cross-Region failover to one of the secondary
DB clusters within your Aurora global database, you can
implement this approach.
*new – Failback is possible now with the managed
failover. After the failover, once the old primary is back,
it’ll automatically build the secondary cluster.
Switchover time – Up to 7mins New primary promotion time – Up to 1.5 mins
10. Managed RPO
25 secs
Replication Lag Detected
global_db_rpo will enforce clusters to be in
sync
Min value = 20 seconds and Max = 68Years
Ensures that at least one secondary server
should be in the RPO limit.
Pause all the transaction commits on the
primary cluster until one of the replica catchup
the lag
35 secs
12. The dark side of global_db_rpo parameter
It will enforce the block transactions,
There is no secondary cluster Removing Primary and Secondary A regional Aurora Cluster
13. The dark side of global_db_rpo parameter
Regional failovers(within same region) can block the transactions up to 5mins
14. TLS Certificate
You’ll not get all the certificates on all the regions
aws rds --region ap-south-2
create-db-instance
--db-instance-identifier bhuvi-secondary-cluster-2
--db-cluster-identifier bhuvi-secondary-cluster
--db-instance-class db.r5.large
--db-parameter-group-name bhuvi-secondary-pg
--enable-performance-insights
--performance-insights-kms-key-id xxxx
--ca-certificate-identifier rds-ca-2019
--engine aurora-postgresql
aws rds describe-certificates
--region ap-south-2 | jq
'.Certificates[].CertificateIdentifier’
"rds-ca-rsa2048-g1"
aws rds describe-certificates
--region ap-south-1 | jq
'.Certificates[].CertificateIdentifier’
"rds-ca-ecc384-g1"
"rds-ca-rsa4096-g1"
"rds-ca-rsa2048-g1"
"rds-ca-2019"
15. Solution for TLS Certificate
Global Bundle certificates can be used to connect the RDS/Aurora instances from any region. It will work if your RDS has the certificate
rds-ca-2019 and rds-ca-rsa-2048-g1
But you’ll not get the option to choose the certificates in all the regions
16. Quiz
1. Can we use different KMS keys for global
clusters(Primary cluster and Secondary cluster)
2. In a Peering connection, Secondary cluster Endpoints are
not resolving on Primary region, but VPC and Subnets have
DNS resolution enabled, How?
17. KMS key for Global Clusters
• Both are using different Storage volumes
• KMS keys can be default or CMK
• You can have different CMK for both the clusters
• You use the combination of Default + CMK
Peering – DNS resolution
• Peering VPCs will not resolve RDS endpoints via private
network
• Enable DNS hostnames and DNS resolution on both the
requester and accepter peering connection settings.
Peering - Security Group
• In the peering connections you cannot whitelist a ID of the
security group if the VPC is in different region.
• You can whitelist
• Specific IP
• IP range of the Subnet
• IP range of the VPC