SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Planning for Disaster Recovery
with Galera Cluster
Colin Charles, colin.charles@galeracluster.com

29 October 2019

https://twitter.com/galeracluster | www.galeracluster.com 

Codership Webinar
Agenda
• Disasters happen

• Trade off’s

• A geo-distributed Galera Cluster

• Architecture

• A DR plan

• Is async the best solution for DR?

• Resources
Galera Cluster highlights
• We talk a lot about High Availability

• We talk a lot about multi-master replication

• Synchronous clusters that can ensure you’re always available

• Quorum based failure handling, optimistic concurrency control to commit

• Optimised for the cloud/Wide Area Networks (WANs)
However how does all this work with Disaster
Recovery?
• Galera Cluster does support being run in multiple data centres

• Effectively you can have a 9-node Galera Cluster across 3 data centres to
keep you highly available

• Galera Cluster supports geo-distributed database clusters

• https://galeracluster.com/2015/07/geo-distributed-database-clusters-
with-galera/
Benefits of a geo-distributed Galera Cluster
• Increased redundancy

• All database operations are local
(segmented)

• Network traffic is reduced across
DCs (with optimised bandwidth
consumption)

• Latency penalty as minimal as
possible (when it is time to
COMMIT, hello speed of light, et al)

• Flow control fully configurable 

• No split brain issues

• Out of the box encryption

• Can also work with asynchronous
replication
So, architecture…
• If you’re doing 9 Galera Cluster nodes at the minimum, you also have to
have your application clusters in 3 DCs

• Sure, this is great for High Availability, but gets costly after sometime…

• You also have to ensure that your schema is planned sensibly, after all, if
you have hot rows, deadlocks, and less tolerance to performance issues
during rollbacks, this may not be the best solution for a busy application
that does a lot of UPDATEs
We are here to talk Disaster Recovery (DR)
• It is the ability to run your business continuously without any interruptions irrespective of any damage occurring to
your infrastructure 

• DR is definitely not cheap, but can you afford to lose business transactions? It is this “backup cost” that you need
to think about

• We’ve seen things inside the Linux kernel that can help with DR too, e.g. DRBD 

• Basically a good DR plan is your Business Continuity Plan (BCP)

• Disaster Recovery and Business Continuity, 3rd Edition by Thejandra BS

• Recovery Time Objective (RTO): the time-scale (in hours or days) within which this must be achieved, that is, the
length of time it can afford to cease operating its business.

• Recovery Point Objective (RPO): the point in time when an organisation should recover, for example, it could be
stated as ‘Data can be recovered as of 9 pm last night’ - it defines the amount of data that it can afford to lose.

• You’re building resilience in your infrastructure
Cloud people understand resilience
• Cloud instances tend to be of varying quality

• Sometimes you spin up a poor instance. Best to kill/restart, as long as you know baseline
benchmarks

• The Simian Army (by Netflix) can help make more resilient infrastructure 

• Includes Chaos Monkey, Latency Monkey, Chaos Gorilla (drops a whole AZ), etc.

• Even spawned a field, Chaos Engineering

• Chaos engineering is the discipline of experimenting on a software system in
production in order to build confidence in the system's capability to withstand turbulent
and unexpected conditions. (ref: https://en.wikipedia.org/wiki/Chaos_engineering)
What else do you need to think about?
• Keep track of the Mean Time to Recover (MTTR)

• Underrated is the Mean Time to Detect (MTTD) — how long do you know
a disaster has struck and can move your workloads?

• What is your SLA?
So what do you need in a plan?
• In terms of a Galera Cluster, you’re really thinking about ensuring you have
another data centre to take over

• You could already be running a 3-DC cluster… 

• But presumably, you’re planning for disaster recovery, likely via
asynchronous replication to another data centre (as it saves the cost of
having yet another DC)

• You also want to make sure all this is 100% fully automated…
You’ll have to think about your entire stack
• Beyond the database, you have to ensure that there will be quick DNS
switchover (so low TTL on your DNS)

• Application servers need to be running and ready to take on the load at
the other data centre

• If using a proxy, this too will have to be awaiting at the other data centre

• So to mitigate from a complete disaster AND have great performance, you
are going to want to create a replica of your setup at a remote site
Why async replication between data centres for
DR?
• Async replication in MySQL 5.7/8 are really quite fast (same with MariaDB
10.3/10.4)

• The idea of “lagging slaves” should not be too much of an issue… this
can be tuned and configured

• You must ask — is fully synchronous replication right for your application? 

• Callaghan’s Law: [In a Galera cluster] a given row can’t be modified more
than once per RTT. 
A practical case study
• A more practical example, by Marco Tusa — https://www.percona.com/
blog/2018/11/15/how-not-to-do-mysql-high-availability-geographic-node-
distribution-with-galera-based-replication-misuse/ AND https://
www.percona.com/blog/2018/11/15/mysql-high-availability-on-premises-
a-geographically-distributed-scenario/
Simple reasons…
• A Galera Cluster across 3-DCs is pricier
than the previous solution, and it gives
you data consistency across all nodes.
You however do need to ensure your
application can take the commit time
penalties, you have a high performant
link for replication…

• The other approach is more focused on
“local commits” (just to you 3-node
cluster in one DC), you’ll see some data
state difference thanks to async
replication, you don’t need a great
replication link, DR works, and also this
works better across geographies 

• We always think latencies, even 5ms
isn’t high, but it actually is!

• We have to remember a Galera writeset
can be as small as a 1 row INSERT but
large with many UPDATEs too

• We have to think about IP frames

• In Galera, flow control is the receiving
queue. There is a queue of events and
the longer this queue is, the longer it
takes for certification too.
All this doesn’t absolve you from other things…
• Like some kind of “automatic failover framework” when you go the async
route for DR

• A good backup and restore solution 

• A good rule based solution for load balancing (ProxySQL, MariaDB
MaxScale)
The Galera Arbitrator Daemon (garbd)
• If you have access to a 3rd data centre, or put a one-node garbd in your
DR site, you could also have a 2-paired cluster in 2 DCs, thus bringing
your node count to a mere 7 nodes (instead of 9)

• When you have an even number of nodes, garbd functions as an odd
node, to avoid split-brain situations. It can also request a consistent
application state snapshot, which help with backups
So what are your choices for ultimate DR?
• If you have the money, 3 data centres so you have synchronous clusters
with 9 Galera Cluster nodes… This is also in addition to your application
servers, proxies, etc.

• 2 data centres, 7 nodes, with the Galera Arbitrator is a possibility

• If you don’t have as much budget, consider the async replication option
between 2 DCs. Just remember all the “manual glue” you may need to go
with this!
• “The dread of a disaster makes everybody act in a way that increases the
disaster.” — Bertrand Russell
Some Galera Cluster specific resources
• https://galeracluster.com/library/documentation/managing-fc.html

• https://galeracluster.com/library/documentation/auto-eviction.html

• https://galeracluster.com/library/documentation/using-sr.html (Galera 4
new feature)

• https://galeracluster.com/library/documentation/backup-cluster.html

• https://galeracluster.com/library/training/tutorials/geo-distributed-
clusters.html
Resources
• Disaster Recovery and Business Continuity, 3rd Edition by Thejandra BS

• Disaster Recovery, Crisis Response, and Business Continuity: A
Management Desk Reference by Jamie Watters

• Business Continuity and Disaster Recovery Planning for IT Professionals,
2nd Edition by Susan Snedaker

• Effective MySQL Backup and Recovery by Ronald Bradford
Questions?
Colin Charles, colin.charles@galeracluster.com

https://twitter.com/galeracluster | www.galeracluster.com
27

Contenu connexe

Tendances

FOSDEM 2022 MySQL Devroom: MySQL 8.0 - Logical Backups, Snapshots and Point-...
FOSDEM 2022 MySQL Devroom:  MySQL 8.0 - Logical Backups, Snapshots and Point-...FOSDEM 2022 MySQL Devroom:  MySQL 8.0 - Logical Backups, Snapshots and Point-...
FOSDEM 2022 MySQL Devroom: MySQL 8.0 - Logical Backups, Snapshots and Point-...Frederic Descamps
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQLMydbops
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Mydbops
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바NeoClova
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldJignesh Shah
 
MySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsMySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsFrederic Descamps
 
MariaDB Galera Cluster presentation
MariaDB Galera Cluster presentationMariaDB Galera Cluster presentation
MariaDB Galera Cluster presentationFrancisco Gonçalves
 
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...Severalnines
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialJean-François Gagné
 
MySQL InnoDB Cluster HA Overview & Demo
MySQL InnoDB Cluster HA Overview & DemoMySQL InnoDB Cluster HA Overview & Demo
MySQL InnoDB Cluster HA Overview & DemoKeith Hollman
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HAharoonm
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyJean-François Gagné
 
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorialMySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorialFrederic Descamps
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - SlidesSeveralnines
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability Mydbops
 
MySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMario Beck
 
MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바NeoClova
 
Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Mydbops
 
How Safe is Asynchronous Master-Master Setup?
 How Safe is Asynchronous Master-Master Setup? How Safe is Asynchronous Master-Master Setup?
How Safe is Asynchronous Master-Master Setup?Sveta Smirnova
 

Tendances (20)

FOSDEM 2022 MySQL Devroom: MySQL 8.0 - Logical Backups, Snapshots and Point-...
FOSDEM 2022 MySQL Devroom:  MySQL 8.0 - Logical Backups, Snapshots and Point-...FOSDEM 2022 MySQL Devroom:  MySQL 8.0 - Logical Backups, Snapshots and Point-...
FOSDEM 2022 MySQL Devroom: MySQL 8.0 - Logical Backups, Snapshots and Point-...
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
MySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsMySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & Operations
 
MariaDB Galera Cluster presentation
MariaDB Galera Cluster presentationMariaDB Galera Cluster presentation
MariaDB Galera Cluster presentation
 
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
 
Curso de MySQL 5.7
Curso de MySQL 5.7Curso de MySQL 5.7
Curso de MySQL 5.7
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
MySQL InnoDB Cluster HA Overview & Demo
MySQL InnoDB Cluster HA Overview & DemoMySQL InnoDB Cluster HA Overview & Demo
MySQL InnoDB Cluster HA Overview & Demo
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorialMySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
MySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB ClusterMySQL InnoDB Cluster and NDB Cluster
MySQL InnoDB Cluster and NDB Cluster
 
MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바
 
Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )Percona XtraDB Cluster ( Ensure high Availability )
Percona XtraDB Cluster ( Ensure high Availability )
 
How Safe is Asynchronous Master-Master Setup?
 How Safe is Asynchronous Master-Master Setup? How Safe is Asynchronous Master-Master Setup?
How Safe is Asynchronous Master-Master Setup?
 

Similaire à Planning for Disaster Recovery (DR) with Galera Cluster

Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu
 
Cassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeCassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeMarc Fielding
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityPapitha Velumani
 
Understanding application requirements
Understanding application requirementsUnderstanding application requirements
Understanding application requirementsCloud Genius
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez DataWorks Summit
 
The MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits inThe MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits inSakari Keskitalo
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Cloud Computing .ppt
Cloud Computing .pptCloud Computing .ppt
Cloud Computing .pptPrukaBay
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
Intro to MySQL Master Slave Replication
Intro to MySQL Master Slave ReplicationIntro to MySQL Master Slave Replication
Intro to MySQL Master Slave Replicationsatejsahu
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr
 
Caching principles-solutions
Caching principles-solutionsCaching principles-solutions
Caching principles-solutionspmanvi
 
Building a Fast, Reliable SQL Server for kCura Relativity
Building a Fast, Reliable SQL Server for kCura RelativityBuilding a Fast, Reliable SQL Server for kCura Relativity
Building a Fast, Reliable SQL Server for kCura RelativityBrent Ozar
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresOzgun Erdogan
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 

Similaire à Planning for Disaster Recovery (DR) with Galera Cluster (20)

Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
My sql
My sqlMy sql
My sql
 
Cassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeCassandra Essentials Day Cambridge
Cassandra Essentials Day Cambridge
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Understanding application requirements
Understanding application requirementsUnderstanding application requirements
Understanding application requirements
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
The MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits inThe MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits in
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
ppt2.pdf
ppt2.pdfppt2.pdf
ppt2.pdf
 
Cloud Computing .ppt
Cloud Computing .pptCloud Computing .ppt
Cloud Computing .ppt
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Intro to MySQL Master Slave Replication
Intro to MySQL Master Slave ReplicationIntro to MySQL Master Slave Replication
Intro to MySQL Master Slave Replication
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & Toubleshooting
 
Caching principles-solutions
Caching principles-solutionsCaching principles-solutions
Caching principles-solutions
 
Building a Fast, Reliable SQL Server for kCura Relativity
Building a Fast, Reliable SQL Server for kCura RelativityBuilding a Fast, Reliable SQL Server for kCura Relativity
Building a Fast, Reliable SQL Server for kCura Relativity
 
Designing your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with PostgresDesigning your SaaS Database for Scale with Postgres
Designing your SaaS Database for Scale with Postgres
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 

Plus de Codership Oy - Creators of Galera Cluster

Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...Codership Oy - Creators of Galera Cluster
 

Plus de Codership Oy - Creators of Galera Cluster (13)

Galera Cluster 4 for MySQL 8 Release Webinar slides
Galera Cluster 4 for MySQL 8 Release Webinar slidesGalera Cluster 4 for MySQL 8 Release Webinar slides
Galera Cluster 4 for MySQL 8 Release Webinar slides
 
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
 
Running Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft AzureRunning Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft Azure
 
Galera Cluster DDL and Schema Upgrades 220217
Galera Cluster DDL and Schema Upgrades 220217Galera Cluster DDL and Schema Upgrades 220217
Galera Cluster DDL and Schema Upgrades 220217
 
Taking Full Advantage of Galera Multi Master Cluster
Taking Full Advantage of Galera Multi Master ClusterTaking Full Advantage of Galera Multi Master Cluster
Taking Full Advantage of Galera Multi Master Cluster
 
Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1
 
Galera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replicationGalera webinar migration to galera cluster from my sql async replication
Galera webinar migration to galera cluster from my sql async replication
 
Codership's galera cluster installation and quickstart webinar march 2016
Codership's galera cluster installation and quickstart webinar march 2016Codership's galera cluster installation and quickstart webinar march 2016
Codership's galera cluster installation and quickstart webinar march 2016
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013
 
Galera Cluster 3.0 Features
Galera Cluster 3.0 FeaturesGalera Cluster 3.0 Features
Galera Cluster 3.0 Features
 
Zero Downtime Schema Changes in Galera Cluster
Zero Downtime Schema Changes in Galera ClusterZero Downtime Schema Changes in Galera Cluster
Zero Downtime Schema Changes in Galera Cluster
 
Introducing Galera 3.0
Introducing Galera 3.0Introducing Galera 3.0
Introducing Galera 3.0
 

Dernier

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 

Dernier (20)

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

Planning for Disaster Recovery (DR) with Galera Cluster

  • 1. Planning for Disaster Recovery with Galera Cluster Colin Charles, colin.charles@galeracluster.com 29 October 2019 https://twitter.com/galeracluster | www.galeracluster.com Codership Webinar
  • 2. Agenda • Disasters happen • Trade off’s • A geo-distributed Galera Cluster • Architecture • A DR plan • Is async the best solution for DR? • Resources
  • 3.
  • 4.
  • 5.
  • 6. Galera Cluster highlights • We talk a lot about High Availability • We talk a lot about multi-master replication • Synchronous clusters that can ensure you’re always available • Quorum based failure handling, optimistic concurrency control to commit • Optimised for the cloud/Wide Area Networks (WANs)
  • 7.
  • 8. However how does all this work with Disaster Recovery? • Galera Cluster does support being run in multiple data centres • Effectively you can have a 9-node Galera Cluster across 3 data centres to keep you highly available • Galera Cluster supports geo-distributed database clusters • https://galeracluster.com/2015/07/geo-distributed-database-clusters- with-galera/
  • 9.
  • 10. Benefits of a geo-distributed Galera Cluster • Increased redundancy • All database operations are local (segmented) • Network traffic is reduced across DCs (with optimised bandwidth consumption) • Latency penalty as minimal as possible (when it is time to COMMIT, hello speed of light, et al) • Flow control fully configurable • No split brain issues • Out of the box encryption • Can also work with asynchronous replication
  • 11. So, architecture… • If you’re doing 9 Galera Cluster nodes at the minimum, you also have to have your application clusters in 3 DCs • Sure, this is great for High Availability, but gets costly after sometime… • You also have to ensure that your schema is planned sensibly, after all, if you have hot rows, deadlocks, and less tolerance to performance issues during rollbacks, this may not be the best solution for a busy application that does a lot of UPDATEs
  • 12. We are here to talk Disaster Recovery (DR) • It is the ability to run your business continuously without any interruptions irrespective of any damage occurring to your infrastructure • DR is definitely not cheap, but can you afford to lose business transactions? It is this “backup cost” that you need to think about • We’ve seen things inside the Linux kernel that can help with DR too, e.g. DRBD • Basically a good DR plan is your Business Continuity Plan (BCP) • Disaster Recovery and Business Continuity, 3rd Edition by Thejandra BS • Recovery Time Objective (RTO): the time-scale (in hours or days) within which this must be achieved, that is, the length of time it can afford to cease operating its business. • Recovery Point Objective (RPO): the point in time when an organisation should recover, for example, it could be stated as ‘Data can be recovered as of 9 pm last night’ - it defines the amount of data that it can afford to lose. • You’re building resilience in your infrastructure
  • 13. Cloud people understand resilience • Cloud instances tend to be of varying quality • Sometimes you spin up a poor instance. Best to kill/restart, as long as you know baseline benchmarks • The Simian Army (by Netflix) can help make more resilient infrastructure • Includes Chaos Monkey, Latency Monkey, Chaos Gorilla (drops a whole AZ), etc. • Even spawned a field, Chaos Engineering • Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system's capability to withstand turbulent and unexpected conditions. (ref: https://en.wikipedia.org/wiki/Chaos_engineering)
  • 14. What else do you need to think about? • Keep track of the Mean Time to Recover (MTTR) • Underrated is the Mean Time to Detect (MTTD) — how long do you know a disaster has struck and can move your workloads? • What is your SLA?
  • 15. So what do you need in a plan? • In terms of a Galera Cluster, you’re really thinking about ensuring you have another data centre to take over • You could already be running a 3-DC cluster… • But presumably, you’re planning for disaster recovery, likely via asynchronous replication to another data centre (as it saves the cost of having yet another DC) • You also want to make sure all this is 100% fully automated…
  • 16. You’ll have to think about your entire stack • Beyond the database, you have to ensure that there will be quick DNS switchover (so low TTL on your DNS) • Application servers need to be running and ready to take on the load at the other data centre • If using a proxy, this too will have to be awaiting at the other data centre • So to mitigate from a complete disaster AND have great performance, you are going to want to create a replica of your setup at a remote site
  • 17.
  • 18. Why async replication between data centres for DR? • Async replication in MySQL 5.7/8 are really quite fast (same with MariaDB 10.3/10.4) • The idea of “lagging slaves” should not be too much of an issue… this can be tuned and configured • You must ask — is fully synchronous replication right for your application? • Callaghan’s Law: [In a Galera cluster] a given row can’t be modified more than once per RTT. 
  • 19. A practical case study • A more practical example, by Marco Tusa — https://www.percona.com/ blog/2018/11/15/how-not-to-do-mysql-high-availability-geographic-node- distribution-with-galera-based-replication-misuse/ AND https:// www.percona.com/blog/2018/11/15/mysql-high-availability-on-premises- a-geographically-distributed-scenario/
  • 20. Simple reasons… • A Galera Cluster across 3-DCs is pricier than the previous solution, and it gives you data consistency across all nodes. You however do need to ensure your application can take the commit time penalties, you have a high performant link for replication… • The other approach is more focused on “local commits” (just to you 3-node cluster in one DC), you’ll see some data state difference thanks to async replication, you don’t need a great replication link, DR works, and also this works better across geographies • We always think latencies, even 5ms isn’t high, but it actually is! • We have to remember a Galera writeset can be as small as a 1 row INSERT but large with many UPDATEs too • We have to think about IP frames • In Galera, flow control is the receiving queue. There is a queue of events and the longer this queue is, the longer it takes for certification too.
  • 21. All this doesn’t absolve you from other things… • Like some kind of “automatic failover framework” when you go the async route for DR • A good backup and restore solution • A good rule based solution for load balancing (ProxySQL, MariaDB MaxScale)
  • 22. The Galera Arbitrator Daemon (garbd) • If you have access to a 3rd data centre, or put a one-node garbd in your DR site, you could also have a 2-paired cluster in 2 DCs, thus bringing your node count to a mere 7 nodes (instead of 9) • When you have an even number of nodes, garbd functions as an odd node, to avoid split-brain situations. It can also request a consistent application state snapshot, which help with backups
  • 23. So what are your choices for ultimate DR? • If you have the money, 3 data centres so you have synchronous clusters with 9 Galera Cluster nodes… This is also in addition to your application servers, proxies, etc. • 2 data centres, 7 nodes, with the Galera Arbitrator is a possibility • If you don’t have as much budget, consider the async replication option between 2 DCs. Just remember all the “manual glue” you may need to go with this!
  • 24. • “The dread of a disaster makes everybody act in a way that increases the disaster.” — Bertrand Russell
  • 25. Some Galera Cluster specific resources • https://galeracluster.com/library/documentation/managing-fc.html • https://galeracluster.com/library/documentation/auto-eviction.html • https://galeracluster.com/library/documentation/using-sr.html (Galera 4 new feature) • https://galeracluster.com/library/documentation/backup-cluster.html • https://galeracluster.com/library/training/tutorials/geo-distributed- clusters.html
  • 26. Resources • Disaster Recovery and Business Continuity, 3rd Edition by Thejandra BS • Disaster Recovery, Crisis Response, and Business Continuity: A Management Desk Reference by Jamie Watters • Business Continuity and Disaster Recovery Planning for IT Professionals, 2nd Edition by Susan Snedaker • Effective MySQL Backup and Recovery by Ronald Bradford