SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Autopsy of a MySQL automation
disaster
by Jean-François Gagné
presented at SRECon Dublin (October 2019)
Senior Infrastructure Engineer / System and MySQL Expert
jeanfrancois AT messagebird DOT com / @jfg956 #SREcon
To err is human
To really foul things up requires a computer[1]
(or a script)
[1]: http://quoteinvestigator.com/2010/12/07/foul-computer/
2
Session Summary
1. MySQL replication
2. Automation disaster: external eye
3. Chain of events: analysis
4. Learning / takeaway
3
MySQL replication
● Typical MySQL replication deployment at Booking.com:
+---+
| M |
+---+
|
+------+-- ... --+---------------+-------- ...
| | | |
+---+ +---+ +---+ +---+
| S1| | S2| | Sn| | M1|
+---+ +---+ +---+ +---+
|
+-- ... --+
| |
+---+ +---+
| T1| | Tm|
+---+ +---+
4
MySQL replication’
● And they use(d) Orchestrator (more about Orchestrator in the next slide):
5
MySQL replication’’
● Orchestrator allows to:
● Visualize replication deployments
● Move slaves for planned maintenance of an intermediate master
● Automatically replace an intermediate master in case of its unexpected failure
(thanks to pseudo-GTIDs when we have not deployed GTIDs)
● Automatically replace a master in case of a failure (failing over to a slave)
● But Orchestrator cannot replace a master alone:
● Booking.com uses DNS for master discovery
● So Orchestrator calls a homemade script to repoint DNS (and to do other magic)
6
Intermediate Master Failure
7
Master Failure
8
MySQL Master High Availability [1 of 4]
Failing-over the master to a slave is my favorite HA method
• But it is not as easy as it sounds, and it is hard to automate well
• An example of complete failover solution in production:
https://github.blog/2018-06-20-mysql-high-availability-at-github/
The five considerations of master high availability:
(https://jfg-mysql.blogspot.com/2019/02/mysql-master-high-availability-and-failover-more-thoughts.html)
• Plan how you are doing master high availability
• Decide when you apply your plan (Failure Detection – FD)
• Tell the application about the change (Service Discovery – SD)
• Protect against the limit of FD and SD for avoiding split-brains (Fencing)
• Fix your data if something goes wrong
9
10
MySQL Master High Availability [2 of 4]
Failure detection (FD) is the 1st part (and 1st challenge) of failover
• It is a very hard problem: partial failure, unreliable network, partitions, …
• It is impossible to be 100% sure of a failure, and confidence needs time
à quick FD is unreliable, relatively reliable FD implies longer unavailability
Ø You need to accept that FD generates false positive (and/or false negative)
Repointing is the 2nd part of failover:
• Relatively easy with the right tools: MHA, GTID, Pseudo-GTID, Binlog Servers, …
• Complexity grows with the number of direct slaves of the master
(what if you cannot contact some of those slaves…)
• Some software for repointing:
• Orchestrator, Ripple Binlog Server,
Replication Manager, MHA, Cluster Control, MaxScale, …
11
MySQL Master High Availability [3 of 4]
In this configuration and when the master fails,
one of the slave needs to be repointed to the new master:
12
MySQL Master High Availability [4 of 4]
Service Discovery (SD) is the 3rd part (and 2nd challenge) of failover:
• If centralised, it is a SPOF; if distributed, impossible to update atomically
• SD will either introduce a bottleneck (including performance limits)
or will be unreliable in some way (pointing to the wrong master)
• Some ways to implement MySQL Master SD: DNS, ViP, Proxy, Zookeeper, …
http://code.openark.org/blog/mysql/mysql-master-discovery-methods-part-1-dns
Ø Unreliable FD and unreliable SD is a recipe for split-brains !
Protecting against split-brains (Fencing): Adv. Subject – not many solutions
(Proxies and semi-synchronous replication might help)
Fixing your data in case of a split-brain: only you can know how to do this !
(tip on this later in the talk)
MySQL Service Discovery @ MessageBird
MessageBird uses ProxySQL for MySQL Service Discovery
13
Orchestrator @ MessageBird
15
Failover War Story
Master failover does not always go as planned
We will now look at our War Story
Our subject database
● Simple replication deployment (in two data centers):
Master RWs +---+
happen here --> | A |
+---+
|
+------------------------+
| |
Reads +---+ +---+
happen here --> | B | | X |
+---+ +---+
|
+---+ And reads
| Y | <-- happen here
+---+
16
Incident: 1st event
● A and B (two servers in same data center) fail at the same time:
Master RWs +-/+
happen here --> | A |
but now failing +/-+
Reads +-/+ +---+
happen here --> | B | | X |
but now failing +/-+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
17
(I will cover how/why this happened later.)
Incident: 1st event’
● Orchestrator fixes things:
+-/+
| A |
+/-+
Reads +-/+ +---+ Now, Master RWs
happen here --> | B | | X | <-- happen here
but now failing +/-+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
18
Split-brain: disaster
● A few things happen in that day and night, and I wake-up to this:
+-/+
| A |
+/-+
Master RWs +---+ +---+
happen here --> | B | | X |
+---+ +---+
|
+---+
| Y |
+---+
19
Split-brain: disaster’
● And to make things worse, reads are still happening on Y:
+-/+
| A |
+/-+
Master RWs +---+ +---+
happen here --> | B | | X |
+---+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
20
Split-brain: disaster’’
● This is not good:
● When A and B failed, X was promoted as the new master
● Something made DNS point to B (we will see what later)
à writes are now happening on B
● But B is outdated: all writes to X (after the failure of A) did not reach B
● So we have data on X that cannot be read on B
● And we have new data on B
that is not read on Y
21
+-/+
| A |
+/-+
Master RWs +---+ +---+
happen here --> | B | | X |
+---+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
Split-brain: analysis
● Digging more in the chain of events, we find that:
● After the 1st failure of A, a 2nd one was detected and Orchestrator failed over to B
● So after their failures, A and B came back and formed an isolated replication chain
22
+-/+
| A |
+/-+
+-/+ +---+ Master RWs
| B | | X | <-- happen here
+/-+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
Split-brain: analysis
● Digging more in the chain of events, we find that:
● After the 1st failure of A, a 2nd one was detected and Orchestrator failed over to B
● So after their failures, A and B came back and formed an isolated replication chain
● And something caused a failure of A
23
+---+
| A |
+---+
|
+---+ +---+ Master RWs
| B | | X | <-- happen here
+---+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
Split-brain: analysis
● Digging more in the chain of events, we find that:
● After the 1st failure of A, a 2nd one was detected and Orchestrator failed over to B
● So after their failures, A and B came back and formed an isolated replication chain
● And something caused a failure of A
● But how did DNS end-up pointing to B ?
● The failover to B called the DNS repointing script
● The script stole the DNS entry
from X and pointed it to B
24
+-/+
| A |
+/-+
+---+ +---+ Master RWs
| B | | X | <-- happen here
+---+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
Split-brain: analysis
● Digging more in the chain of events, we find that:
● After the 1st failure of A, a 2nd one was detected and Orchestrator failed over to B
● So after their failures, A and B came back and formed an isolated replication chain
● And something caused a failure of A
● But how did DNS end-up pointing to B ?
● The failover to B called the DNS repointing script
● The script stole the DNS entry
from X and pointed it to B
● But is that all: what made A fail ?
25
+-/+
| A |
+/-+
Master RWs +---+ +---+
happen here --> | B | | X |
+---+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
Split-brain: analysis’
● What made A fail ?
● Once A and B came back up as a new replication chain, they had outdated data
● If B would have come back before A, it could have been re-slaved to X
26
+---+
| A |
+---+
|
+---+ +---+ Master RWs
| B | | X | <-- happen here
+---+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
Split-brain: analysis’
● What made A fail ?
● Once A and B came back up as a new replication chain, they had outdated data
● If B would have come back before A, it could have been re-slaved to X
● But because A came back before re-slaving, it injected heartbeat and p-GTID to B
27
+-/+
| A |
+/-+
+---+ +---+ Master RWs
| B | | X | <-- happen here
+---+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
Split-brain: analysis’
● What made A fail ?
● Once A and B came back up as a new replication chain, they had outdated data
● If B would have come back before A, it could have been re-slaved to X
● But because A came back before re-slaving, it injected heartbeat and p-GTID to B
● Then B could have been re-cloned without problems
28
+---+
| A |
+---+
|
+---+ +---+ Master RWs
| B | | X | <-- happen here
+---+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
Split-brain: analysis’
● What made A fail ?
● Once A and B came back up as a new replication chain, they had outdated data
● If B would have come back before A, it could have been re-slaved to X
● But because A came back before re-slaving, it injected heartbeat and p-GTID to B
● Then B could have been re-cloned without problems
● But A was re-cloned instead (human error #1)
29
+---+
| A |
+---+
+-/+ +---+ Master RWs
| B | | X | <-- happen here
+/-+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
Split-brain: analysis’
● What made A fail ?
● Once A and B came back up as a new replication chain, they had outdated data
● If B would have come back before A, it could have been re-slaved to X
● But because A came back before re-slaving, it injected heartbeat and p-GTID to B
● Then B could have been re-cloned without problems
● But A was re-cloned instead (human error #1)
● Why did Orchestrator not fail-over right away ?
● B was promoted hours after A was brought down…
● Because A was downtimed only for 4 hours (human error #2)
30
+-/+
| A |
+/-+
+---+ +---+ Master RWs
| B | | X | <-- happen here
+---+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
Orchestrator anti-flapping
● Orchestrator has a failover throttling/acknowledgment mechanism[1]:
● Automated recovery will happen
● for an instance in a cluster that has not recently been recovered
● unless such recent recoveries were acknowledged
● In our case:
● the recovery might have been acknowledged too early (human error #0 ?)
● with a too short “recently” timeout
● and maybe Orchestrator should not have failed over the second time
[1]: https://github.com/github/orchestrator/blob/master/docs/topology-recovery.md
#blocking-acknowledgments-anti-flapping
31
Split-brain: summary
● So in summary, this disaster was caused by:
1. A fancy failure: two servers failing at the same time
2. A debatable premature acknowledgment in Orchestrator
and probably too short a timeout for recent failover
3. Edge-case recovery: both servers forming a new replication topology
4. Restarting with the event-scheduler enabled (A injecting heartbeat and p-GTID)
5. Re-cloning wrong (A instead of B; should have created C and D and thrown away A
and B; too short downtime for the re-cloning)
6. Orchestrator failing over something that it should not have
(including taking an questionnable action when a downtime expired)
7. DNS repointing script not defensive enough
32
Fancy failure: more details
● Why did A and B fail at the same time ?
● Deployment error: the two servers in the same rack/failure domain ?
● And/or very unlucky ?
● Very unlucky because…
10 to 20 servers failed that day in the same data center
Because human operations and “sensitive” hardware
33
+-/+
| A |
+/-+
+-/+ +---+
| B | | X |
+/-+ +---+
|
+---+
| Y |
+---+
How to fix such situation ?
● Fixing “split-brain” data on B and X is hard
● Some solutions are:
● Kill B or X (and lose data)
● Replay writes from B on X or vice-versa (manually or with replication)
● But AUTO_INCREMENTs are in the way:
● up to i used on A before 1st failover
● i-n to j1 used on X after recovery
● i to j2 used on B after 2nd failover
34
+-/+
| A |
+/-+
Master RWs +---+ +---+
happen here --> | B | | X |
+---+ +---+
|
+---+ Reads
| Y | <-- happen here
+---+
Takeaway
● Twisted situations happen
● Automation (including failover) is not simple:
à code automation scripts defensively
● Be mindful for premature acknowledgment, downtime more than less,
shutdown slaves first à understand complex interactions of tools in details
● Try something else than AUTO-INCREMENTs for Primary Key
(monotonically increasing UUID[1] [2] ?)
[1]: https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/
[2]: http://mysql.rjweb.org/doc.php/uuid
35
36
Re Failover War Story
Master failover does not always go as planned
• because it is complicated
It is not a matter of “if” things go wrong
• but “when” things will go wrong
Please share your war stories
• so we can learn from each-others’ experience
• GitHub has a MySQL public Post-Mortem (great of them to share this):
https://blog.github.com/2018-10-30-oct21-post-incident-analysis/
MessageBird is hiring
messagebird.com/en/careers/
Thanks !
by Jean-François Gagné
presented at SRECon Dublin (October 2019)
Senior Infrastructure Engineer / System and MySQL Expert
jeanfrancois AT messagebird DOT com / @jfg956 #SREcon

Contenu connexe

Tendances

MySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestMySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestI Goo Lee
 
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François Gagné
 
MySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialSveta Smirnova
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterKenny Gryp
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialJean-François Gagné
 
Architecture of exadata database machine – Part II
Architecture of exadata database machine – Part IIArchitecture of exadata database machine – Part II
Architecture of exadata database machine – Part IIParesh Nayak,OCP®,Prince2®
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?Mydbops
 
InnoDB Performance Optimisation
InnoDB Performance OptimisationInnoDB Performance Optimisation
InnoDB Performance OptimisationMydbops
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group ReplicationKenny Gryp
 
Replication Troubleshooting in Classic VS GTID
Replication Troubleshooting in Classic VS GTIDReplication Troubleshooting in Classic VS GTID
Replication Troubleshooting in Classic VS GTIDMydbops
 
How to set up orchestrator to manage thousands of MySQL servers
How to set up orchestrator to manage thousands of MySQL serversHow to set up orchestrator to manage thousands of MySQL servers
How to set up orchestrator to manage thousands of MySQL serversSimon J Mudd
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...ScaleGrid.io
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceMariaDB plc
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLOlivier DASINI
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScaleMariaDB plc
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder
 
Exploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12cExploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12cZohar Elkayam
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slidesMohamed Farouk
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsJean-François Gagné
 
MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting Mydbops
 

Tendances (20)

MySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestMySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software Test
 
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
 
MySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete Tutorial
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
Architecture of exadata database machine – Part II
Architecture of exadata database machine – Part IIArchitecture of exadata database machine – Part II
Architecture of exadata database machine – Part II
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
 
InnoDB Performance Optimisation
InnoDB Performance OptimisationInnoDB Performance Optimisation
InnoDB Performance Optimisation
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
 
Replication Troubleshooting in Classic VS GTID
Replication Troubleshooting in Classic VS GTIDReplication Troubleshooting in Classic VS GTID
Replication Troubleshooting in Classic VS GTID
 
How to set up orchestrator to manage thousands of MySQL servers
How to set up orchestrator to manage thousands of MySQL serversHow to set up orchestrator to manage thousands of MySQL servers
How to set up orchestrator to manage thousands of MySQL servers
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
 
MariaDB MaxScale
MariaDB MaxScaleMariaDB MaxScale
MariaDB MaxScale
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
 
Exploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12cExploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12c
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slides
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting MySQL GTID Concepts, Implementation and troubleshooting
MySQL GTID Concepts, Implementation and troubleshooting
 

Similaire à Autopsy of a MySQL Automation Disaster

The two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.comThe two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.comJean-François Gagné
 
MySQL 5.7 innodb_enhance_partii_20160527
MySQL 5.7 innodb_enhance_partii_20160527MySQL 5.7 innodb_enhance_partii_20160527
MySQL 5.7 innodb_enhance_partii_20160527Saewoong Lee
 
Percona Live '18 Tutorial: The Accidental DBA
Percona Live '18 Tutorial: The Accidental DBAPercona Live '18 Tutorial: The Accidental DBA
Percona Live '18 Tutorial: The Accidental DBAJenni Snyder
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Miscelaneous Debris
Miscelaneous DebrisMiscelaneous Debris
Miscelaneous Debrisfrewmbot
 
More on gdb for my sql db as (fosdem 2016)
More on gdb for my sql db as (fosdem 2016)More on gdb for my sql db as (fosdem 2016)
More on gdb for my sql db as (fosdem 2016)Valeriy Kravchuk
 
Cloudy with a Chance of Fireballs: Provisioning and Certificate Management in...
Cloudy with a Chance of Fireballs: Provisioning and Certificate Management in...Cloudy with a Chance of Fireballs: Provisioning and Certificate Management in...
Cloudy with a Chance of Fireballs: Provisioning and Certificate Management in...Puppet
 
32373 uploading-php-shell-through-sql-injection
32373 uploading-php-shell-through-sql-injection32373 uploading-php-shell-through-sql-injection
32373 uploading-php-shell-through-sql-injectionAttaporn Ninsuwan
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayCosimo Streppone
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles BackupsScyllaDB
 
Functional Database Strategies
Functional Database StrategiesFunctional Database Strategies
Functional Database StrategiesJason Swartz
 
Case Study: MySQL migration from latin1 to UTF-8
Case Study: MySQL migration from latin1 to UTF-8Case Study: MySQL migration from latin1 to UTF-8
Case Study: MySQL migration from latin1 to UTF-8Olivier DASINI
 
Build a DataWarehouse for your logs with Python, AWS Athena and Glue
Build a DataWarehouse for your logs with Python, AWS Athena and GlueBuild a DataWarehouse for your logs with Python, AWS Athena and Glue
Build a DataWarehouse for your logs with Python, AWS Athena and GlueMaxym Kharchenko
 
Brace yourselves, leap second is coming
Brace yourselves, leap second is comingBrace yourselves, leap second is coming
Brace yourselves, leap second is comingNati Cohen
 
causos da linha de frente - #rsonrails 2011
causos da linha de frente - #rsonrails 2011causos da linha de frente - #rsonrails 2011
causos da linha de frente - #rsonrails 2011bzanchet
 
An Introduction to Priam
An Introduction to PriamAn Introduction to Priam
An Introduction to PriamJason Brown
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015Dave Stokes
 

Similaire à Autopsy of a MySQL Automation Disaster (20)

Autopsy of an automation disaster
Autopsy of an automation disasterAutopsy of an automation disaster
Autopsy of an automation disaster
 
The two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.comThe two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.com
 
MySQLinsanity
MySQLinsanityMySQLinsanity
MySQLinsanity
 
MySQL 5.7 innodb_enhance_partii_20160527
MySQL 5.7 innodb_enhance_partii_20160527MySQL 5.7 innodb_enhance_partii_20160527
MySQL 5.7 innodb_enhance_partii_20160527
 
Percona Live '18 Tutorial: The Accidental DBA
Percona Live '18 Tutorial: The Accidental DBAPercona Live '18 Tutorial: The Accidental DBA
Percona Live '18 Tutorial: The Accidental DBA
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Miscelaneous Debris
Miscelaneous DebrisMiscelaneous Debris
Miscelaneous Debris
 
More on gdb for my sql db as (fosdem 2016)
More on gdb for my sql db as (fosdem 2016)More on gdb for my sql db as (fosdem 2016)
More on gdb for my sql db as (fosdem 2016)
 
Cloudy with a Chance of Fireballs: Provisioning and Certificate Management in...
Cloudy with a Chance of Fireballs: Provisioning and Certificate Management in...Cloudy with a Chance of Fireballs: Provisioning and Certificate Management in...
Cloudy with a Chance of Fireballs: Provisioning and Certificate Management in...
 
Curso de MySQL 5.7
Curso de MySQL 5.7Curso de MySQL 5.7
Curso de MySQL 5.7
 
32373 uploading-php-shell-through-sql-injection
32373 uploading-php-shell-through-sql-injection32373 uploading-php-shell-through-sql-injection
32373 uploading-php-shell-through-sql-injection
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
 
Functional Database Strategies
Functional Database StrategiesFunctional Database Strategies
Functional Database Strategies
 
Case Study: MySQL migration from latin1 to UTF-8
Case Study: MySQL migration from latin1 to UTF-8Case Study: MySQL migration from latin1 to UTF-8
Case Study: MySQL migration from latin1 to UTF-8
 
Build a DataWarehouse for your logs with Python, AWS Athena and Glue
Build a DataWarehouse for your logs with Python, AWS Athena and GlueBuild a DataWarehouse for your logs with Python, AWS Athena and Glue
Build a DataWarehouse for your logs with Python, AWS Athena and Glue
 
Brace yourselves, leap second is coming
Brace yourselves, leap second is comingBrace yourselves, leap second is coming
Brace yourselves, leap second is coming
 
causos da linha de frente - #rsonrails 2011
causos da linha de frente - #rsonrails 2011causos da linha de frente - #rsonrails 2011
causos da linha de frente - #rsonrails 2011
 
An Introduction to Priam
An Introduction to PriamAn Introduction to Priam
An Introduction to Priam
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
 

Plus de Jean-François Gagné

Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyJean-François Gagné
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentJean-François Gagné
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyJean-François Gagné
 
MySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.comMySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.comJean-François Gagné
 
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsMySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsJean-François Gagné
 
How Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagHow Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagJean-François Gagné
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsJean-François Gagné
 
MySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitationsMySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitationsJean-François Gagné
 
Riding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamRiding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamJean-François Gagné
 

Plus de Jean-François Gagné (9)

Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated Environment
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
MySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.comMySQL Parallel Replication by Booking.com
MySQL Parallel Replication by Booking.com
 
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsMySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
 
How Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagHow Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lag
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
MySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitationsMySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitations
 
Riding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamRiding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication Stream
 

Dernier

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Dernier (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Autopsy of a MySQL Automation Disaster

  • 1. Autopsy of a MySQL automation disaster by Jean-François Gagné presented at SRECon Dublin (October 2019) Senior Infrastructure Engineer / System and MySQL Expert jeanfrancois AT messagebird DOT com / @jfg956 #SREcon
  • 2. To err is human To really foul things up requires a computer[1] (or a script) [1]: http://quoteinvestigator.com/2010/12/07/foul-computer/ 2
  • 3. Session Summary 1. MySQL replication 2. Automation disaster: external eye 3. Chain of events: analysis 4. Learning / takeaway 3
  • 4. MySQL replication ● Typical MySQL replication deployment at Booking.com: +---+ | M | +---+ | +------+-- ... --+---------------+-------- ... | | | | +---+ +---+ +---+ +---+ | S1| | S2| | Sn| | M1| +---+ +---+ +---+ +---+ | +-- ... --+ | | +---+ +---+ | T1| | Tm| +---+ +---+ 4
  • 5. MySQL replication’ ● And they use(d) Orchestrator (more about Orchestrator in the next slide): 5
  • 6. MySQL replication’’ ● Orchestrator allows to: ● Visualize replication deployments ● Move slaves for planned maintenance of an intermediate master ● Automatically replace an intermediate master in case of its unexpected failure (thanks to pseudo-GTIDs when we have not deployed GTIDs) ● Automatically replace a master in case of a failure (failing over to a slave) ● But Orchestrator cannot replace a master alone: ● Booking.com uses DNS for master discovery ● So Orchestrator calls a homemade script to repoint DNS (and to do other magic) 6
  • 9. MySQL Master High Availability [1 of 4] Failing-over the master to a slave is my favorite HA method • But it is not as easy as it sounds, and it is hard to automate well • An example of complete failover solution in production: https://github.blog/2018-06-20-mysql-high-availability-at-github/ The five considerations of master high availability: (https://jfg-mysql.blogspot.com/2019/02/mysql-master-high-availability-and-failover-more-thoughts.html) • Plan how you are doing master high availability • Decide when you apply your plan (Failure Detection – FD) • Tell the application about the change (Service Discovery – SD) • Protect against the limit of FD and SD for avoiding split-brains (Fencing) • Fix your data if something goes wrong 9
  • 10. 10 MySQL Master High Availability [2 of 4] Failure detection (FD) is the 1st part (and 1st challenge) of failover • It is a very hard problem: partial failure, unreliable network, partitions, … • It is impossible to be 100% sure of a failure, and confidence needs time à quick FD is unreliable, relatively reliable FD implies longer unavailability Ø You need to accept that FD generates false positive (and/or false negative) Repointing is the 2nd part of failover: • Relatively easy with the right tools: MHA, GTID, Pseudo-GTID, Binlog Servers, … • Complexity grows with the number of direct slaves of the master (what if you cannot contact some of those slaves…) • Some software for repointing: • Orchestrator, Ripple Binlog Server, Replication Manager, MHA, Cluster Control, MaxScale, …
  • 11. 11 MySQL Master High Availability [3 of 4] In this configuration and when the master fails, one of the slave needs to be repointed to the new master:
  • 12. 12 MySQL Master High Availability [4 of 4] Service Discovery (SD) is the 3rd part (and 2nd challenge) of failover: • If centralised, it is a SPOF; if distributed, impossible to update atomically • SD will either introduce a bottleneck (including performance limits) or will be unreliable in some way (pointing to the wrong master) • Some ways to implement MySQL Master SD: DNS, ViP, Proxy, Zookeeper, … http://code.openark.org/blog/mysql/mysql-master-discovery-methods-part-1-dns Ø Unreliable FD and unreliable SD is a recipe for split-brains ! Protecting against split-brains (Fencing): Adv. Subject – not many solutions (Proxies and semi-synchronous replication might help) Fixing your data in case of a split-brain: only you can know how to do this ! (tip on this later in the talk)
  • 13. MySQL Service Discovery @ MessageBird MessageBird uses ProxySQL for MySQL Service Discovery 13
  • 15. 15 Failover War Story Master failover does not always go as planned We will now look at our War Story
  • 16. Our subject database ● Simple replication deployment (in two data centers): Master RWs +---+ happen here --> | A | +---+ | +------------------------+ | | Reads +---+ +---+ happen here --> | B | | X | +---+ +---+ | +---+ And reads | Y | <-- happen here +---+ 16
  • 17. Incident: 1st event ● A and B (two servers in same data center) fail at the same time: Master RWs +-/+ happen here --> | A | but now failing +/-+ Reads +-/+ +---+ happen here --> | B | | X | but now failing +/-+ +---+ | +---+ Reads | Y | <-- happen here +---+ 17 (I will cover how/why this happened later.)
  • 18. Incident: 1st event’ ● Orchestrator fixes things: +-/+ | A | +/-+ Reads +-/+ +---+ Now, Master RWs happen here --> | B | | X | <-- happen here but now failing +/-+ +---+ | +---+ Reads | Y | <-- happen here +---+ 18
  • 19. Split-brain: disaster ● A few things happen in that day and night, and I wake-up to this: +-/+ | A | +/-+ Master RWs +---+ +---+ happen here --> | B | | X | +---+ +---+ | +---+ | Y | +---+ 19
  • 20. Split-brain: disaster’ ● And to make things worse, reads are still happening on Y: +-/+ | A | +/-+ Master RWs +---+ +---+ happen here --> | B | | X | +---+ +---+ | +---+ Reads | Y | <-- happen here +---+ 20
  • 21. Split-brain: disaster’’ ● This is not good: ● When A and B failed, X was promoted as the new master ● Something made DNS point to B (we will see what later) à writes are now happening on B ● But B is outdated: all writes to X (after the failure of A) did not reach B ● So we have data on X that cannot be read on B ● And we have new data on B that is not read on Y 21 +-/+ | A | +/-+ Master RWs +---+ +---+ happen here --> | B | | X | +---+ +---+ | +---+ Reads | Y | <-- happen here +---+
  • 22. Split-brain: analysis ● Digging more in the chain of events, we find that: ● After the 1st failure of A, a 2nd one was detected and Orchestrator failed over to B ● So after their failures, A and B came back and formed an isolated replication chain 22 +-/+ | A | +/-+ +-/+ +---+ Master RWs | B | | X | <-- happen here +/-+ +---+ | +---+ Reads | Y | <-- happen here +---+
  • 23. Split-brain: analysis ● Digging more in the chain of events, we find that: ● After the 1st failure of A, a 2nd one was detected and Orchestrator failed over to B ● So after their failures, A and B came back and formed an isolated replication chain ● And something caused a failure of A 23 +---+ | A | +---+ | +---+ +---+ Master RWs | B | | X | <-- happen here +---+ +---+ | +---+ Reads | Y | <-- happen here +---+
  • 24. Split-brain: analysis ● Digging more in the chain of events, we find that: ● After the 1st failure of A, a 2nd one was detected and Orchestrator failed over to B ● So after their failures, A and B came back and formed an isolated replication chain ● And something caused a failure of A ● But how did DNS end-up pointing to B ? ● The failover to B called the DNS repointing script ● The script stole the DNS entry from X and pointed it to B 24 +-/+ | A | +/-+ +---+ +---+ Master RWs | B | | X | <-- happen here +---+ +---+ | +---+ Reads | Y | <-- happen here +---+
  • 25. Split-brain: analysis ● Digging more in the chain of events, we find that: ● After the 1st failure of A, a 2nd one was detected and Orchestrator failed over to B ● So after their failures, A and B came back and formed an isolated replication chain ● And something caused a failure of A ● But how did DNS end-up pointing to B ? ● The failover to B called the DNS repointing script ● The script stole the DNS entry from X and pointed it to B ● But is that all: what made A fail ? 25 +-/+ | A | +/-+ Master RWs +---+ +---+ happen here --> | B | | X | +---+ +---+ | +---+ Reads | Y | <-- happen here +---+
  • 26. Split-brain: analysis’ ● What made A fail ? ● Once A and B came back up as a new replication chain, they had outdated data ● If B would have come back before A, it could have been re-slaved to X 26 +---+ | A | +---+ | +---+ +---+ Master RWs | B | | X | <-- happen here +---+ +---+ | +---+ Reads | Y | <-- happen here +---+
  • 27. Split-brain: analysis’ ● What made A fail ? ● Once A and B came back up as a new replication chain, they had outdated data ● If B would have come back before A, it could have been re-slaved to X ● But because A came back before re-slaving, it injected heartbeat and p-GTID to B 27 +-/+ | A | +/-+ +---+ +---+ Master RWs | B | | X | <-- happen here +---+ +---+ | +---+ Reads | Y | <-- happen here +---+
  • 28. Split-brain: analysis’ ● What made A fail ? ● Once A and B came back up as a new replication chain, they had outdated data ● If B would have come back before A, it could have been re-slaved to X ● But because A came back before re-slaving, it injected heartbeat and p-GTID to B ● Then B could have been re-cloned without problems 28 +---+ | A | +---+ | +---+ +---+ Master RWs | B | | X | <-- happen here +---+ +---+ | +---+ Reads | Y | <-- happen here +---+
  • 29. Split-brain: analysis’ ● What made A fail ? ● Once A and B came back up as a new replication chain, they had outdated data ● If B would have come back before A, it could have been re-slaved to X ● But because A came back before re-slaving, it injected heartbeat and p-GTID to B ● Then B could have been re-cloned without problems ● But A was re-cloned instead (human error #1) 29 +---+ | A | +---+ +-/+ +---+ Master RWs | B | | X | <-- happen here +/-+ +---+ | +---+ Reads | Y | <-- happen here +---+
  • 30. Split-brain: analysis’ ● What made A fail ? ● Once A and B came back up as a new replication chain, they had outdated data ● If B would have come back before A, it could have been re-slaved to X ● But because A came back before re-slaving, it injected heartbeat and p-GTID to B ● Then B could have been re-cloned without problems ● But A was re-cloned instead (human error #1) ● Why did Orchestrator not fail-over right away ? ● B was promoted hours after A was brought down… ● Because A was downtimed only for 4 hours (human error #2) 30 +-/+ | A | +/-+ +---+ +---+ Master RWs | B | | X | <-- happen here +---+ +---+ | +---+ Reads | Y | <-- happen here +---+
  • 31. Orchestrator anti-flapping ● Orchestrator has a failover throttling/acknowledgment mechanism[1]: ● Automated recovery will happen ● for an instance in a cluster that has not recently been recovered ● unless such recent recoveries were acknowledged ● In our case: ● the recovery might have been acknowledged too early (human error #0 ?) ● with a too short “recently” timeout ● and maybe Orchestrator should not have failed over the second time [1]: https://github.com/github/orchestrator/blob/master/docs/topology-recovery.md #blocking-acknowledgments-anti-flapping 31
  • 32. Split-brain: summary ● So in summary, this disaster was caused by: 1. A fancy failure: two servers failing at the same time 2. A debatable premature acknowledgment in Orchestrator and probably too short a timeout for recent failover 3. Edge-case recovery: both servers forming a new replication topology 4. Restarting with the event-scheduler enabled (A injecting heartbeat and p-GTID) 5. Re-cloning wrong (A instead of B; should have created C and D and thrown away A and B; too short downtime for the re-cloning) 6. Orchestrator failing over something that it should not have (including taking an questionnable action when a downtime expired) 7. DNS repointing script not defensive enough 32
  • 33. Fancy failure: more details ● Why did A and B fail at the same time ? ● Deployment error: the two servers in the same rack/failure domain ? ● And/or very unlucky ? ● Very unlucky because… 10 to 20 servers failed that day in the same data center Because human operations and “sensitive” hardware 33 +-/+ | A | +/-+ +-/+ +---+ | B | | X | +/-+ +---+ | +---+ | Y | +---+
  • 34. How to fix such situation ? ● Fixing “split-brain” data on B and X is hard ● Some solutions are: ● Kill B or X (and lose data) ● Replay writes from B on X or vice-versa (manually or with replication) ● But AUTO_INCREMENTs are in the way: ● up to i used on A before 1st failover ● i-n to j1 used on X after recovery ● i to j2 used on B after 2nd failover 34 +-/+ | A | +/-+ Master RWs +---+ +---+ happen here --> | B | | X | +---+ +---+ | +---+ Reads | Y | <-- happen here +---+
  • 35. Takeaway ● Twisted situations happen ● Automation (including failover) is not simple: à code automation scripts defensively ● Be mindful for premature acknowledgment, downtime more than less, shutdown slaves first à understand complex interactions of tools in details ● Try something else than AUTO-INCREMENTs for Primary Key (monotonically increasing UUID[1] [2] ?) [1]: https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/ [2]: http://mysql.rjweb.org/doc.php/uuid 35
  • 36. 36 Re Failover War Story Master failover does not always go as planned • because it is complicated It is not a matter of “if” things go wrong • but “when” things will go wrong Please share your war stories • so we can learn from each-others’ experience • GitHub has a MySQL public Post-Mortem (great of them to share this): https://blog.github.com/2018-10-30-oct21-post-incident-analysis/
  • 38. Thanks ! by Jean-François Gagné presented at SRECon Dublin (October 2019) Senior Infrastructure Engineer / System and MySQL Expert jeanfrancois AT messagebird DOT com / @jfg956 #SREcon