SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
Parallel Replication in MySQL 5.7 and 8.0
by Booking.com
Presented at Pre-FOSDEM MySQL Day on Friday February 2nd, 2018
Eduardo Ortega (MySQL Database Engineer)
eduardo DTA ortega AT booking.com
Jean-François Gagné (System Engineer)
jeanfrancois DOT gagne AT booking.com
2
Booking.com
● Based in Amsterdam since 1996
● Online Hotel and Accommodation (Travel) Agent (OTA):
● +1.636.000 properties in 229 countries
● +1.555.000 room nights reserved daily
● +40 languages (website and customer service)
● +15.000 people working in 198 offices worldwide
● Part of the Priceline Group
● And we use MySQL:
● Thousands (1000s) of servers
3
Booking.com’
● And we are hiring !
● MySQL Engineer / DBA
● System Administrator
● System Engineer
● Site Reliability Engineer
● Developer / Designer
● Technical Team Lead
● Product Owner
● Data Scientist
● And many more…
● https://workingatbooking.com/ 4
Session Summary
1. Introducing Parallel Replication (// Replication)
2. MySQL 5.7: Logical Clock and Intervals
3. MySQL 5.7: Tuning Intervals
4. Write Set in MySQL 8.0
5. Benchmark results from Booking.com with MySQL 8.0
5
// Replication
● Relatively new because it is hard
● It is hard because of data consistency
● Running trx in // must give the same result on all slaves (= the master)
● Why is it important ?
● Computers have many Cores, using a single one for writes is a waste
● Some computer resources can give more throughput when used in parallel
(RAID1 has 2 disks  we can do 2 Read IOs in parallel)
(SSDs can serve many Read and/or Write IOs in parallel)
6
Reminder
● MySQL 5.6 has support for schema based parallel replication
● MySQL 5.7 adds support for logical clock parallel replication
● In early version, the logical clock is group commit based
● In current version, the logical clock is interval based
● MySQL 8.0 adds support for Write Set parallelism identification
● Write Set can also be found in MySQL 5.7 in Group replication
7
MySQL 5.7: LOGICAL CLOCK
● MySQL 5.7 has two slave_parallel_type:
● both need “SET GLOBAL slave_parallel_workers = N;” (with N > 1)
● DATABASE: the schema based // replication from 5.6 (not what we are talking about here)
● LOGICAL_CLOCK: “Transactions that are part of the same binary log group commit on a
master are applied in parallel on a slave.” (from the doc. but not exact: Bug#85977)
● the LOGICAL_CLOCK type is implemented by putting interval information in the binary logs
● LOGICAL_CLOCK is limited by the following:
● Problems with long/big transactions
● Problems with intermediate masters (IM)
● And it is optimized by slowing down the master to speedup the slave:
● binlog_group_commit_sync_delay
● binlog_group_commit_sync_no_delay_count
8
MySQL 5.7: LOGICAL CLOCK’
● Long transactions can block the parallel execution pipeline
● On the master: ---------------- Time --------------->
T1: B-------------------------C
T2: B--C
T3: B--C
● On the slaves: T1: B-------------------------C
T2: B-- . . . . . . . . . . . C
T3: B-- . . . . . . . . . . . C
 Try reducing as much as possible the number of big transactions:
• Easier said than done: 10 ms is big compared to 1 ms
 Avoid monster transactions (LOAD DATA, unbounded UPDATE or DELETE, …)
9
MySQL 5.7: LOGICAL CLOCK’’
● Replicating through intermediate masters (IM) shorten intervals
● Four transactions on X, Y and Z:
+---+
| X |
+---+
|
V
+---+
| Y |
+---+
|
V
+---+
| Z |
+---+
● To get maximum replication speed, replace IM by Binlog Servers (or use MySQL 8.0)
● More details at http://blog.booking.com/better_parallel_replication_for_mysql.html
10
On Y:
----Time---->
B---C
B---C
B-------C
B-------C
On Z:
----Time--------->
B---C
B---C
B-------C
B-------C
On X:
----Time---->
T1 B---C
T2 B---C
T3 B-------C
T4 B-------C
MySQL 5.7: LOGICAL CLOCK’’’
● By default, MySQL 5.7 in logical clock does out-of-order commit:
 There will be gaps (“START SLAVE UNTIL SQL_AFTER_MTS_GAPS;”)
● Not replication crash safe without GTIDs
http://jfg-mysql.blogspot.com/2016/01/replication-crash-safety-with-mts.html
● And also be careful about these:
binary logs content, SHOW SLAVE STATUS, skipping transactions, backups, …
● Using slave_preserve_commit_order = 1 does what you expect:
● This configuration does not generate gap
● But it needs log_slave_updates (feature request to remove this limitation: Bug#75396)
● Still it is not replication crash safe (surprising because no gap): Bug#80103 & Bug#81840
● And it can hang if slave_transaction_retries is too low: Bug#89247
11
MySQL // Replication Guts: Intervals
● In MySQL (5.7 and higher), each transaction is tagged with two (2) numbers:
● sequence_number: increasing id for each trx (not to confuse with GTID)
● last_committed: sequence_number of the latest trx on which this trx depends
(This can be understood as the “write view” of the current transaction)
● The last_committed / sequence_number pair is the parallelism interval
● Here an example of intervals for MySQL 5.7:
...
#170206 20:08:33 ... last_committed=6201 sequence_number=6203
#170206 20:08:33 ... last_committed=6203 sequence_number=6204
#170206 20:08:33 ... last_committed=6203 sequence_number=6205
#170206 20:08:33 ... last_committed=6203 sequence_number=6206
#170206 20:08:33 ... last_committed=6205 sequence_number=6207
... 12
MySQL 5.7 – Intervals Generation
MySQL 5.7 leverages parallelism on the master to generate intervals:
● sequence_number is an increasing id for each trx (not GTID)
(Reset to 1 at the beginning of each new binary log)
● last_committed is (in MySQL 5.7) the sequence number of the most recently
committed transaction when the current transaction gets its last lock
(Reset to 0 at the beginning of each new binary log)
...
#170206 20:08:33 ... last_committed=6201 sequence_number=6203
#170206 20:08:33 ... last_committed=6203 sequence_number=6204
#170206 20:08:33 ... last_committed=6203 sequence_number=6205
#170206 20:08:33 ... last_committed=6203 sequence_number=6206
#170206 20:08:33 ... last_committed=6205 sequence_number=6207
... 13
MySQL – Intervals Quality
● For measuring parallelism identification quality with MySQL,
we have a metric: the Average Modified Interval Length (AMIL)
● If we prefer to think in terms of group commit size, the AMIL can be mapped
to a pseudo-group commit size by multiplying the AMIL by 2 and subtracting one
● For a group commit of size n, the sum of the intervals length is n*(n+1) / 2
#170206 20:08:33 ... last_committed=6203 sequence_number=6204
#170206 20:08:33 ... last_committed=6203 sequence_number=6205
#170206 20:08:33 ... last_committed=6203 sequence_number=6206
14
MySQL – Intervals Quality
● For measuring parallelism identification quality with MySQL,
we have a metric: the Average Modified Interval Length (AMIL)
● If we prefer to think in terms of group commit size, the AMIL can be mapped
to a pseudo-group commit size by multiplying the AMIL by 2 and subtracting one
● For a group commit of size n, the sum of the intervals length is n*(n+1)/2
 AMIL = (n+1)/2 (after dividing by n), algebra gives us n = AMIL * 2 - 1
● This mapping could give a hint for slave_parallel_workers
(http://jfg-mysql.blogspot.com/2017/02/metric-for-tuning-parallel-replication-mysql-5-7.html)
15
MySQL – Intervals Quality’
● Why do we need to “modify” the interval length ?
● Because of a limitation in the current MTS applier which will only start trx 93136
once 93131 is completed  last_committed=93124 is modified to 93131
#170206 21:19:31 ... last_committed=93124 sequence_number=93131
#170206 21:19:31 ... last_committed=93131 sequence_number=93132
#170206 21:19:31 ... last_committed=93131 sequence_number=93133
#170206 21:19:31 ... last_committed=93131 sequence_number=93134
#170206 21:19:31 ... last_committed=93131 sequence_number=93135
#170206 21:19:31 ... last_committed=93124 sequence_number=93136
#170206 21:19:31 ... last_committed=93131 sequence_number=93137
#170206 21:19:31 ... last_committed=93131 sequence_number=93138
#170206 21:19:31 ... last_committed=93132 sequence_number=93139
#170206 21:19:31 ... last_committed=93138 sequence_number=93140
16
MySQL – Intervals Quality’’
● Script to compute the Average Modified Interval Length:
file=my_binlog_index_file;
echo _first_binlog_to_analyse_ > $file;
mysqlbinlog --stop-never -R --host 127.0.0.1 $(cat $file) |
grep "^#" | grep -e last_committed -e "Rotate to" |
awk -v file=$file -F "[ t]*|=" '$11 == "last_committed" {
if (length($2) == 7) {$2 = "0" $2;}
if ($12 < max) {$12 = max;} else {max = $12;}
print $1, $2, $14 - $12;}
$10 == "Rotate"{print $12 > file; close(file); max=0;}' |
awk -F " |:" '{my_h = $2 ":" $3 ":" $4;}
NR == 1 {d=$1; h=my_h; n=0; sum=0; sum2=0;}
d != $1 || h < my_h {print d, h, n, sum, sum2; d=$1; h=my_h;}
{n++; sum += $5; sum2 += $5 * $5;}'
(https://jfg-mysql.blogspot.com/2017/02/metric-for-tuning-parallel-replication-mysql-5-7.html)
17
MySQL – Intervals Quality’’’
● Computing the AMIL needs parsing the binary logs
● This is complicated and needs to handle many special cases
● Exposing counters for computing the AMIL would be better:
● Bug#85965: Expose, on the master, counters for monitoring // information quality.
● Bug#85966: Expose, on slaves, counters for monitoring // information quality.
(https://jfg-mysql.blogspot.com/2017/02/metric-for-tuning-parallel-replication-mysql-5-7.html)
18
MySQL 5.7 – Tuning
● AMIL without and with tuning (delay) on four (4) Booking.com masters:
(speed-up the slaves by increasing binlog_group_commit_sync_delay)
19
MySQL 8.0 – Write Set
● MySQL 8.0.1 introduced a new way to identify parallelism
● Instead of setting last_committed to “the seq. number of the most
recently committed transaction when the current trx gets its last lock”…
● MySQL 8.0.1 uses “the sequence number of the last transaction that
updated the same rows as the current transaction”
● To do that, MySQL 8.0 remembers which rows (tuples) are modified by each
transaction: this is the Write Set
● Write Set are not put in the binary logs, they allow to “widen” the intervals
20
MySQL 8.0 – Write Set’
● MySQL 8.0.1 introduces new global variables to control Write Set:
● transaction_write_set_extraction = [ OFF | XXHASH64 ]
● binlog_transaction_dependency_history_size (default to 25000)
● binlog_transaction_dependency_tracking = [ COMMIT_ORDER | WRITESET_SESSION | WRITESET ]
● WRITESET_SESSION: no two updates from the same session can be reordered
● WRITESET: any transactions which write different tuples can be parallelized
● WRITESET_SESSION will not work well for cnx recycling (Cnx Pools or Proxies):
● Recycling a connection with WRITESET_SESSION impedes parallelism identification
● Unless using the function reset_connection (with Bug#86063 fixed in 8.0.4)
21
MySQL 8.0 – Write Set’’
● To use Write Set on a Master:
● transaction_write_set_extraction = XXHASH64
● binlog_transaction_dependency_tracking = [ WRITESET_SESSION | WRITESET ]
(if WRITESET, slave_preserve_commit_order can avoid temporary inconsistencies)
● To use Write Set on an Intermediate Master (even single-threaded):
● transaction_write_set_extraction = XXHASH64
● binlog_transaction_dependency_tracking = WRITESET
(slave_preserve_commit_order can avoid temporary inconsistencies)
● To stop using Write Set:
● binlog_transaction_dependency_tracking = COMMIT_ORDER
● transaction_write_set_extraction = OFF
22
MySQL 8.0 – Write Set’’’
● Result for single-threaded Booking.com Intermediate Master (before and after):
#170409 3:37:13 [...] last_committed=6695 sequence_number=6696 [...]
#170409 3:37:14 [...] last_committed=6696 sequence_number=6697 [...]
#170409 3:37:14 [...] last_committed=6697 sequence_number=6698 [...]
#170409 3:37:14 [...] last_committed=6698 sequence_number=6699 [...]
#170409 3:37:14 [...] last_committed=6699 sequence_number=6700 [...]
#170409 3:37:14 [...] last_committed=6700 sequence_number=6701 [...]
#170409 3:37:14 [...] last_committed=6700 sequence_number=6702 [...]
#170409 3:37:14 [...] last_committed=6700 sequence_number=6703 [...]
#170409 3:37:14 [...] last_committed=6700 sequence_number=6704 [...]
#170409 3:37:14 [...] last_committed=6704 sequence_number=6705 [...]
#170409 3:37:14 [...] last_committed=6700 sequence_number=6706 [...]
23
MySQL 8.0 – Write Set’’’ ’
#170409 3:37:17 [...] last_committed=6700 sequence_number=6766 [...]
#170409 3:37:17 [...] last_committed=6752 sequence_number=6767 [...]
#170409 3:37:17 [...] last_committed=6753 sequence_number=6768 [...]
#170409 3:37:17 [...] last_committed=6700 sequence_number=6769 [...]
[...]
#170409 3:37:18 [...] last_committed=6700 sequence_number=6783 [...]
#170409 3:37:18 [...] last_committed=6768 sequence_number=6784 [...]
#170409 3:37:18 [...] last_committed=6784 sequence_number=6785 [...]
#170409 3:37:18 [...] last_committed=6785 sequence_number=6786 [...]
#170409 3:37:18 [...] last_committed=6785 sequence_number=6787 [...]
[...]
#170409 3:37:22 [...] last_committed=6785 sequence_number=6860 [...]
#170409 3:37:22 [...] last_committed=6842 sequence_number=6861 [...]
#170409 3:37:22 [...] last_committed=6843 sequence_number=6862 [...]
#170409 3:37:22 [...] last_committed=6785 sequence_number=6863
MySQL 8.0 – AMIL of Write Set
25
● AMIL on a single-threaded 8.0.1 Intermediate Master (IM) without/with Write Set:
MySQL 8.0 – Write Set vs Delay
● AMIL on Booking.com masters with delay vs Write Set on Intermediate Master:
26
MySQL 8.0 – Write Set’’’ ’’
● Write Set advantages:
● No need to slowdown the master (maybe not true in all cases)
● Will work even at low concurrency on the master
● Allows to test without upgrading the master (works on an intermediate master)
(however, this sacrifices session consistency, which might give optimistic results)
● Mitigate the problem of losing parallelism via intermediate masters
(only with binlog_transaction_dependency_tracking = WRITESET)
( the best solution is still Binlog Servers)
27
MySQL 8.0 – Write Set’’’ ’’’
● Write Set limitations:
● Needs Row-Based-Replication on the master (or intermediate master)
● Not working for trx updating tables without PK and trx updating tables having FK
(it will fall back to COMMIT_ORDER for those transactions)
● Barrier at each DDL (Bug#86060 for adding counters)
● Barrier at each binary log rotation: no transactions in different binlogs can be run in //
● With WRITESET_SESSION, does not play well with connection recycling
(Could use COM_RESET_CONNECTION if Bug#86063 is fixed)
● Write Set drawbacks:
● Slowdown the master ? Consume more RAM ?
● New technology: not fully mastered yet and there are bugs (still 1st DMR release)
28
MySQL 8.0 – Write Set @ B.com
● Tests on eight (8) real Booking.com environments (different workloads):
● A is MySQL 5.6 and 5.7 masters (1 and 7), some are SBR (4) some are RBR (4)
● B is MySQL 8.0.3 Intermediate Master with Write Set (RBR)
set global transaction_write_set_extraction = XXHASH64;
set global binlog_transaction_dependency_tracking = WRITESET;
● C is a slave with local SSD storage
+---+ +---+ +---+
| A | -------> | B | -------> | C |
+---+ +---+ +---+
● Run with 0, 2, 4, 8, 16, 32, 64, 128 and 256 workers,
with High Durability (HD - sync_binlog = 1 & trx commit = 1) and No Durability (ND – 0 & 2),
without and with slave_preserve_commit_order (NO and WO)
with and without log_slave_updates (IM and SB)
MySQL 8.0 – Write Set Speedups
30
E5 IM-HD Single-Threaded: 6138 seconds
E5 IM-ND Single-Threaded: 2238 seconds
MySQL 8.0 – Write Set Speedups’
31
MySQL 8.0 – Write Set Speedups’’
32
MySQL 8.0 – Write Set Speedups’’’
33
MySQL 8.0 – Speedup Summary
● No thrashing when too many threads !
● For the environments with High Durability:
● Two (2) “interesting” speedups: 1.6, 1.7
● One (1) good: 2.7
● Four (4) very good speedups: 4.4, 4.5, 5.6, and 5.8
● One (1) great speedups: 10.8 !
● For the environments without durability (ND):
● Three (3) good speedups: 1.3, 1.5 and 1.8
● Three (3) very good speedups: 2.4, 2.8 and 2.9
● Two (2) great speedups: 3.7 and 4.8 !
● All that without tuning MySQL or the application
MySQL 8.0 – Looking at low speedups
35
MySQL 8.0 – Looking at low speedups’
36
MySQL 8.0 – Looking at good speedups
37
MySQL 8.0 – Looking at good speedups’
38
MySQL 8.0 – Looking at great speedups
39
How do we tune slave_parallel_workers?
Goal of tuning:
find a value that gives good speedup
without wasting too much resources
We can use:
- Commit rate
- Replication catch-up speed
But these are workload - dependent, making optimization harder:
- Workload variations over time
- Hardware differences
Wouldn’t it be cool to have a workload - independent
metric to help optimize slave_parallel_workers?
Such a metric would:
● Have a good value when all of your applier
threads are doing work
● Be sensitive to having idle threads.
Sounds like a resource allocation fairness
problem...
In a fair world...
● slave_parallel_workers = n
● Each worker performs an equal amount of “work”
Say n = 5
Stuff that would not be good:
● No actual parallelism
In an unfair world...
● Poor parallelism
How do we define “work”?
Inspired by Percona blog post by Stephane Combaudon
(https://www.percona.com/blog/2016/02/10/estimating-potential-for-mysql-5-7-parallel-replication/)
● Percentage of commits applied per worker thread
Easy to measure from performance schema:
USE performance_schema;
-- SET UP PS instrumentation
UPDATE setup_consumers SET enabled = 'YES' WHERE NAME LIKE 'events_transactions%';
UPDATE setup_instruments SET enabled = 'YES', timed = 'YES' WHERE NAME = 'transaction';
-- Get information on commits per applier thread
SELECT T.thread_id, T.count_star AS COUNT_STAR
FROM events_transactions_summary_by_thread_by_event_name T
WHERE T.thread_id IN (SELECT thread_id FROM replication_applier_status_by_worker);
What does it give?
> SELECT [...];
+-----------+------------+
| thread_id | COUNT_STAR |
+-----------+------------+
| 63581 | 394 |
| 63582 | 292 |
| 63583 | 272 |
| 63584 | 260 |
| 63585 | 251 |
| 63586 | 237 |
[...]
| 63606 | 108 |
| 63607 | 106 |
| 63608 | 104 |
| 63609 | 100 |
| 63610 | 98 |
| 63611 | 94 |
| 63612 | 89 |
+-----------+------------+
32 rows in set (0.00 sec)
> SELECT [...];
+-----------+------------+
| thread_id | COUNT_STAR |
+-----------+------------+
| 63627 | 749 |
| 63628 | 504 |
| 63629 | 457 |
| 63630 | 433 |
| 63631 | 405 |
| 63632 | 384 |
[...]
| 63684 | 73 |
| 63685 | 73 |
| 63686 | 71 |
| 63687 | 68 |
| 63688 | 65 |
| 63689 | 65 |
| 63690 | 62 |
+-----------+------------+
64 rows in set (0.00 sec)
> SELECT [...];
+-----------+------------+
| thread_id | COUNT_STAR |
+-----------+------------+
| 63704 | 711 |
| 63705 | 468 |
| 63706 | 420 |
| 63707 | 378 |
| 63708 | 359 |
| 63709 | 340 |
[...]
| 63825 | 3 |
| 63826 | 3 |
| 63827 | 3 |
| 63828 | 3 |
| 63829 | 3 |
| 63830 | 2 |
| 63831 | 2 |
+-----------+------------+
128 rows in set (0.00 sec)
What does it give?
> SELECT [...];
+-----------+------------+
| thread_id | COUNT_STAR |
+-----------+------------+
| 63581 | 394 |
| 63582 | 292 |
| 63583 | 272 |
| 63584 | 260 |
| 63585 | 251 |
| 63586 | 237 |
[...]
| 63606 | 108 |
| 63607 | 106 |
| 63608 | 104 |
| 63609 | 100 |
| 63610 | 98 |
| 63611 | 94 |
| 63612 | 89 |
+-----------+------------+
32 rows in set (0.00 sec)
> SELECT [...];
+-----------+------------+
| thread_id | COUNT_STAR |
+-----------+------------+
| 63627 | 749 |
| 63628 | 504 |
| 63629 | 457 |
| 63630 | 433 |
| 63631 | 405 |
| 63632 | 384 |
[...]
| 63684 | 73 |
| 63685 | 73 |
| 63686 | 71 |
| 63687 | 68 |
| 63688 | 65 |
| 63689 | 65 |
| 63690 | 62 |
+-----------+------------+
64 rows in set (0.00 sec)
> SELECT [...];
+-----------+------------+
| thread_id | COUNT_STAR |
+-----------+------------+
| 63704 | 711 |
| 63705 | 468 |
| 63706 | 420 |
| 63707 | 378 |
| 63708 | 359 |
| 63709 | 340 |
[...]
| 63825 | 3 |
| 63826 | 3 |
| 63827 | 3 |
| 63828 | 3 |
| 63829 | 3 |
| 63830 | 2 |
| 63831 | 2 |
+-----------+------------+
128 rows in set (0.00 sec)
How can we understand those numbers: Jain’s index
Jain’s fairness index:
● 𝑛: number of applier threads.
● 𝑖: the i-th applier thread.
● 𝑥𝑖 :amount of work performed by thread
> SELECT [...];
+-----------+------------+
| thread_id | COUNT_STAR |
+-----------+------------+
| 63627 | 749 |
| 63628 | 504 |
| 63629 | 457 |
| 63630 | 433 |
| 63631 | 405 |
| 63632 | 384 |
[...]
| 63684 | 73 |
| 63685 | 73 |
| 63686 | 71 |
| 63687 | 68 |
| 63688 | 65 |
| 63689 | 65 |
| 63690 | 62 |
+-----------+------------+
64 rows in set (0.00 sec)
What does Jain’s index mean?
- 0 ≤ J ≤ 1
- For n = 1  J = 1 (but no parallelism)
- If n > 1 and J ≅ 1, all workers are doing similar work
- If n > 1 and J < 1, some workers are doing less work
- If n > 1 and J << 1, some workers are idle
Avoid J ≅ 1
And avoid J << 1
0
1
Putting the idea to the test
5.7 Master
5.7 replica
8.0 Intermediate
master
8.0 replicaGroup commit sync delay = 100 ms
Group commit sync no delay count=50
Writeset enabled
Replication stopped for 24 h
Applier threads: 1 ≤ n ≤ 128
Replication stopped for 24 h
Applier threads: 1 ≤ n ≤ 128
First, something familiar to compare with:
Optimal values for n:
• For 5.7, n = 8 or 16
• For 8.0, n = 32 or 64
Higher values of n would be
wasting resources on
applier threads that have no
work to do.
0
1000
2000
3000
4000
5000
6000
0 20 40 60 80 100 120 140
Commit rate vs n
C / s (5.7) C / s (8.0)
What does Jain´s index have to say?
0
0,2
0,4
0,6
0,8
1
1,2
0 20 40 60 80 100 120 140
J vs n
J (5.7) J (8.0)
Caveats...
● What happens when not catching up but just keeping up with current replication traffic?
Not enough work means idle threads
 Optimal number of workers depends on catching up or keeping up
Another way to look at the numbers?
> SELECT [...];
+-----------+------------+
| thread_id | COUNT_STAR |
+-----------+------------+
| 63581 | 394 |
| 63582 | 292 |
| 63583 | 272 |
| 63584 | 260 |
| 63585 | 251 |
| 63586 | 237 |
[...]
| 63606 | 108 |
| 63607 | 106 |
| 63608 | 104 |
| 63609 | 100 |
| 63610 | 98 |
| 63611 | 94 |
| 63612 | 89 |
+-----------+------------+
32 rows in set (0.00 sec)
> SELECT [...];
+-----------+------------+
| thread_id | COUNT_STAR |
+-----------+------------+
| 63627 | 749 |
| 63628 | 504 |
| 63629 | 457 |
| 63630 | 433 |
| 63631 | 405 |
| 63632 | 384 |
[...]
| 63684 | 73 |
| 63685 | 73 |
| 63686 | 71 |
| 63687 | 68 |
| 63688 | 65 |
| 63689 | 65 |
| 63690 | 62 |
+-----------+------------+
64 rows in set (0.00 sec)
> SELECT [...];
+-----------+------------+
| thread_id | COUNT_STAR |
+-----------+------------+
| 63704 | 711 |
| 63705 | 468 |
| 63706 | 420 |
| 63707 | 378 |
| 63708 | 359 |
| 63709 | 340 |
[...]
| 63825 | 3 |
| 63826 | 3 |
| 63827 | 3 |
| 63828 | 3 |
| 63829 | 3 |
| 63830 | 2 |
| 63831 | 2 |
+-----------+------------+
128 rows in set (0.00 sec)
MySQL 8.0 – What about commit order ?
55
MySQL 8.0 – What about commit order ?’
56
MySQL 5.7 and 8.0 // Repl. Summary
● Parallel replication in MySQL 5.7 is not simple:
● Need precise tuning
● Long transactions block the parallel replication pipeline
● Care about Intermediate masters
● Write Set in MySQL 8.0 gives very interesting results:
● No problem with Intermediate masters
● Allows to test with Intermediate Master
● Some great speedups and most of them very good
And please
test by yourself
and share results
// Replication: Links
● Replication crash safety with MTS in MySQL 5.6 and 5.7: reality or illusion?
https://jfg-mysql.blogspot.com/2016/01/replication-crash-safety-with-mts.html
● A Metric for Tuning Parallel Replication in MySQL 5.7
https://jfg-mysql.blogspot.com/2017/02/metric-for-tuning-parallel-replication-mysql-5-7.html
● Solving MySQL Replication Lag with LOGICAL_CLOCK and Calibrated Delay
https://www.vividcortex.com/blog/solving-mysql-replication-lag-with-logical_clock-and-calibrated-delay
● How to Fix a Lagging MySQL Replication
https://thoughts.t37.net/fixing-a-very-lagging-mysql-replication-db6eb5a6e15d
● Binlog Servers:
● http://blog.booking.com/mysql_slave_scaling_and_more.html
● Better Parallel Replication for MySQL: http://blog.booking.com/better_parallel_replication_for_mysql.html
● http://blog.booking.com/abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfiguring_slaves.html
59
// Replication: Links’
● An update on Write Set (parallel replication) bug fix in MySQL 8.0
https://jfg-mysql.blogspot.com/2018/01/an-update-on-write-set-parallel-replication-bug-fix-in-mysql-8-0.html
● Write Set in MySQL 5.7: Group Replication
https://jfg-mysql.blogspot.com/2018/01/write-set-in-mysql-5-7-group-replication.html
● More Write Set in MySQL: Group Replication Certification
https://jfg-mysql.blogspot.com/2018/01/more-write-set-in-mysql-5-7-group-replication-certification.html
60
// Replication: Links’’
● Bugs/feature requests:
● The doc. of slave-parallel-type=LOGICAL_CLOCK wrongly reference Group Commit: Bug#85977
● Allow slave_preserve_commit_order without log-slave-updates: Bug#75396
● MTS with slave_preserve_commit_order not repl. crash safe: Bug#80103
● Automatic Repl. Recovery Does Not Handle Lost Relay Log Events: Bug#81840
● Expose, on the master/slave, counters for monitoring // info. quality: Bug#85965 & Bug#85966
● Expose counters for monitoring Write Set barriers: Bug#86060
● Deadlock with slave_preserve_commit_order=ON with Bug#86078: Bug#86079 & Bug#89247
● Fixed bugs:
● Message after MTS crash misleading: Bug#80102 (and Bug#77496)
● Replication position lost after crash on MTS configured slave: Bug#77496
● Full table scan bug in InnoDB: MDEV-10649, Bug#82968 and Bug#82969
● The function reset_connection does not reset Write Set in WRITESET_SESSION: Bug#86063
● Bad Write Set tracking with UNIQUE KEY on a DELETE followed by an INSERT: Bug#86078
Thanks
Eduardo Ortega (MySQL Database Engineer)
eduardo DOT ortega AT booking.com
Jean-François Gagné (System Engineer)
jeanfrancois DOT gagne AT booking.com

Contenu connexe

Tendances

ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQLMydbops
 
Intro ProxySQL
Intro ProxySQLIntro ProxySQL
Intro ProxySQLI Goo Lee
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바NeoClova
 
Maxscale_메뉴얼
Maxscale_메뉴얼Maxscale_메뉴얼
Maxscale_메뉴얼NeoClova
 
MySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestMySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestI Goo Lee
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability SolutionsMydbops
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기NeoClova
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterKenny Gryp
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestratorYoungHeon (Roy) Kim
 
The consequences of sync_binlog != 1
The consequences of sync_binlog != 1The consequences of sync_binlog != 1
The consequences of sync_binlog != 1Jean-François Gagné
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)Altinity Ltd
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability Mydbops
 
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...Severalnines
 
MySQL GTID 시작하기
MySQL GTID 시작하기MySQL GTID 시작하기
MySQL GTID 시작하기I Goo Lee
 
MaxScale이해와활용-2023.11
MaxScale이해와활용-2023.11MaxScale이해와활용-2023.11
MaxScale이해와활용-2023.11NeoClova
 
MySQL InnoDB Clusterによる高可用性構成(DB Tech Showcase 2017)
MySQL InnoDB Clusterによる高可用性構成(DB Tech Showcase 2017)MySQL InnoDB Clusterによる高可用性構成(DB Tech Showcase 2017)
MySQL InnoDB Clusterによる高可用性構成(DB Tech Showcase 2017)Shinya Sugiyama
 
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.NAVER D2
 
Maxscale 소개 1.1.1
Maxscale 소개 1.1.1Maxscale 소개 1.1.1
Maxscale 소개 1.1.1NeoClova
 
How to set up orchestrator to manage thousands of MySQL servers
How to set up orchestrator to manage thousands of MySQL serversHow to set up orchestrator to manage thousands of MySQL servers
How to set up orchestrator to manage thousands of MySQL serversSimon J Mudd
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)YoungHeon (Roy) Kim
 

Tendances (20)

ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
 
Intro ProxySQL
Intro ProxySQLIntro ProxySQL
Intro ProxySQL
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바
 
Maxscale_메뉴얼
Maxscale_메뉴얼Maxscale_메뉴얼
Maxscale_메뉴얼
 
MySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software TestMySQL/MariaDB Proxy Software Test
MySQL/MariaDB Proxy Software Test
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability Solutions
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
 
The consequences of sync_binlog != 1
The consequences of sync_binlog != 1The consequences of sync_binlog != 1
The consequences of sync_binlog != 1
 
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
New features in ProxySQL 2.0 (updated to 2.0.9) by Rene Cannao (ProxySQL)
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
 
MySQL GTID 시작하기
MySQL GTID 시작하기MySQL GTID 시작하기
MySQL GTID 시작하기
 
MaxScale이해와활용-2023.11
MaxScale이해와활용-2023.11MaxScale이해와활용-2023.11
MaxScale이해와활용-2023.11
 
MySQL InnoDB Clusterによる高可用性構成(DB Tech Showcase 2017)
MySQL InnoDB Clusterによる高可用性構成(DB Tech Showcase 2017)MySQL InnoDB Clusterによる高可用性構成(DB Tech Showcase 2017)
MySQL InnoDB Clusterによる高可用性構成(DB Tech Showcase 2017)
 
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
[135] 오픈소스 데이터베이스, 은행 서비스에 첫발을 내밀다.
 
Maxscale 소개 1.1.1
Maxscale 소개 1.1.1Maxscale 소개 1.1.1
Maxscale 소개 1.1.1
 
How to set up orchestrator to manage thousands of MySQL servers
How to set up orchestrator to manage thousands of MySQL serversHow to set up orchestrator to manage thousands of MySQL servers
How to set up orchestrator to manage thousands of MySQL servers
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)
 

Similaire à MySQL Parallel Replication by Booking.com

MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...Jean-François Gagné
 
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsMySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsJean-François Gagné
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsJean-François Gagné
 
Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication Mydbops
 
MySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitationsMySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitationsJean-François Gagné
 
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDBWebinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDBSeveralnines
 
MySQL Timeout Variables Explained
MySQL Timeout Variables Explained MySQL Timeout Variables Explained
MySQL Timeout Variables Explained Mydbops
 
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in RedisRedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in RedisRedis Labs
 
Webinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterWebinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterSeveralnines
 
More on gdb for my sql db as (fosdem 2016)
More on gdb for my sql db as (fosdem 2016)More on gdb for my sql db as (fosdem 2016)
More on gdb for my sql db as (fosdem 2016)Valeriy Kravchuk
 
Upgrade to MySQL 5.6 without downtime
Upgrade to MySQL 5.6 without downtimeUpgrade to MySQL 5.6 without downtime
Upgrade to MySQL 5.6 without downtimeOlivier DASINI
 
Riding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamRiding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamJean-François Gagné
 
PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesInMobi Technology
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentJean-François Gagné
 
IT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New FeaturesIT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New FeaturesFromDual GmbH
 
MySQL User Group NL - MySQL 8
MySQL User Group NL - MySQL 8MySQL User Group NL - MySQL 8
MySQL User Group NL - MySQL 8Frederic Descamps
 
Lightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning SpeedLightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning SpeedScyllaDB
 
MySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellMySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellEmily Ikuta
 
MariaDB 10.2 New Features
MariaDB 10.2 New FeaturesMariaDB 10.2 New Features
MariaDB 10.2 New FeaturesFromDual GmbH
 
PL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptxPL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptxVinicius M Grippa
 

Similaire à MySQL Parallel Replication by Booking.com (20)

MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
MySQL Parallel Replication (LOGICAL_CLOCK): all the 5.7 (and some of the 8.0)...
 
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitationsMySQL/MariaDB Parallel Replication: inventory, use-case and limitations
MySQL/MariaDB Parallel Replication: inventory, use-case and limitations
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication
 
MySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitationsMySQL Parallel Replication: inventory, use-cases and limitations
MySQL Parallel Replication: inventory, use-cases and limitations
 
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDBWebinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
 
MySQL Timeout Variables Explained
MySQL Timeout Variables Explained MySQL Timeout Variables Explained
MySQL Timeout Variables Explained
 
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in RedisRedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
 
Webinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterWebinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera Cluster
 
More on gdb for my sql db as (fosdem 2016)
More on gdb for my sql db as (fosdem 2016)More on gdb for my sql db as (fosdem 2016)
More on gdb for my sql db as (fosdem 2016)
 
Upgrade to MySQL 5.6 without downtime
Upgrade to MySQL 5.6 without downtimeUpgrade to MySQL 5.6 without downtime
Upgrade to MySQL 5.6 without downtime
 
Riding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication StreamRiding the Binlog: an in Deep Dissection of the Replication Stream
Riding the Binlog: an in Deep Dissection of the Replication Stream
 
PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major Features
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated Environment
 
IT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New FeaturesIT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New Features
 
MySQL User Group NL - MySQL 8
MySQL User Group NL - MySQL 8MySQL User Group NL - MySQL 8
MySQL User Group NL - MySQL 8
 
Lightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning SpeedLightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning Speed
 
MySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellMySQL 5.7 in a Nutshell
MySQL 5.7 in a Nutshell
 
MariaDB 10.2 New Features
MariaDB 10.2 New FeaturesMariaDB 10.2 New Features
MariaDB 10.2 New Features
 
PL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptxPL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptx
 

Plus de Jean-François Gagné

Autopsy of a MySQL Automation Disaster
Autopsy of a MySQL Automation DisasterAutopsy of a MySQL Automation Disaster
Autopsy of a MySQL Automation DisasterJean-François Gagné
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentJean-François Gagné
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyJean-François Gagné
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyJean-François Gagné
 
The two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.comThe two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.comJean-François Gagné
 
How Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagHow Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagJean-François Gagné
 

Plus de Jean-François Gagné (7)

Autopsy of a MySQL Automation Disaster
Autopsy of a MySQL Automation DisasterAutopsy of a MySQL Automation Disaster
Autopsy of a MySQL Automation Disaster
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated Environment
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
The two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.comThe two little bugs that almost brought down Booking.com
The two little bugs that almost brought down Booking.com
 
Autopsy of an automation disaster
Autopsy of an automation disasterAutopsy of an automation disaster
Autopsy of an automation disaster
 
How Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagHow Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lag
 

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

MySQL Parallel Replication by Booking.com

  • 1. Parallel Replication in MySQL 5.7 and 8.0 by Booking.com Presented at Pre-FOSDEM MySQL Day on Friday February 2nd, 2018 Eduardo Ortega (MySQL Database Engineer) eduardo DTA ortega AT booking.com Jean-François Gagné (System Engineer) jeanfrancois DOT gagne AT booking.com
  • 2. 2
  • 3. Booking.com ● Based in Amsterdam since 1996 ● Online Hotel and Accommodation (Travel) Agent (OTA): ● +1.636.000 properties in 229 countries ● +1.555.000 room nights reserved daily ● +40 languages (website and customer service) ● +15.000 people working in 198 offices worldwide ● Part of the Priceline Group ● And we use MySQL: ● Thousands (1000s) of servers 3
  • 4. Booking.com’ ● And we are hiring ! ● MySQL Engineer / DBA ● System Administrator ● System Engineer ● Site Reliability Engineer ● Developer / Designer ● Technical Team Lead ● Product Owner ● Data Scientist ● And many more… ● https://workingatbooking.com/ 4
  • 5. Session Summary 1. Introducing Parallel Replication (// Replication) 2. MySQL 5.7: Logical Clock and Intervals 3. MySQL 5.7: Tuning Intervals 4. Write Set in MySQL 8.0 5. Benchmark results from Booking.com with MySQL 8.0 5
  • 6. // Replication ● Relatively new because it is hard ● It is hard because of data consistency ● Running trx in // must give the same result on all slaves (= the master) ● Why is it important ? ● Computers have many Cores, using a single one for writes is a waste ● Some computer resources can give more throughput when used in parallel (RAID1 has 2 disks  we can do 2 Read IOs in parallel) (SSDs can serve many Read and/or Write IOs in parallel) 6
  • 7. Reminder ● MySQL 5.6 has support for schema based parallel replication ● MySQL 5.7 adds support for logical clock parallel replication ● In early version, the logical clock is group commit based ● In current version, the logical clock is interval based ● MySQL 8.0 adds support for Write Set parallelism identification ● Write Set can also be found in MySQL 5.7 in Group replication 7
  • 8. MySQL 5.7: LOGICAL CLOCK ● MySQL 5.7 has two slave_parallel_type: ● both need “SET GLOBAL slave_parallel_workers = N;” (with N > 1) ● DATABASE: the schema based // replication from 5.6 (not what we are talking about here) ● LOGICAL_CLOCK: “Transactions that are part of the same binary log group commit on a master are applied in parallel on a slave.” (from the doc. but not exact: Bug#85977) ● the LOGICAL_CLOCK type is implemented by putting interval information in the binary logs ● LOGICAL_CLOCK is limited by the following: ● Problems with long/big transactions ● Problems with intermediate masters (IM) ● And it is optimized by slowing down the master to speedup the slave: ● binlog_group_commit_sync_delay ● binlog_group_commit_sync_no_delay_count 8
  • 9. MySQL 5.7: LOGICAL CLOCK’ ● Long transactions can block the parallel execution pipeline ● On the master: ---------------- Time ---------------> T1: B-------------------------C T2: B--C T3: B--C ● On the slaves: T1: B-------------------------C T2: B-- . . . . . . . . . . . C T3: B-- . . . . . . . . . . . C  Try reducing as much as possible the number of big transactions: • Easier said than done: 10 ms is big compared to 1 ms  Avoid monster transactions (LOAD DATA, unbounded UPDATE or DELETE, …) 9
  • 10. MySQL 5.7: LOGICAL CLOCK’’ ● Replicating through intermediate masters (IM) shorten intervals ● Four transactions on X, Y and Z: +---+ | X | +---+ | V +---+ | Y | +---+ | V +---+ | Z | +---+ ● To get maximum replication speed, replace IM by Binlog Servers (or use MySQL 8.0) ● More details at http://blog.booking.com/better_parallel_replication_for_mysql.html 10 On Y: ----Time----> B---C B---C B-------C B-------C On Z: ----Time---------> B---C B---C B-------C B-------C On X: ----Time----> T1 B---C T2 B---C T3 B-------C T4 B-------C
  • 11. MySQL 5.7: LOGICAL CLOCK’’’ ● By default, MySQL 5.7 in logical clock does out-of-order commit:  There will be gaps (“START SLAVE UNTIL SQL_AFTER_MTS_GAPS;”) ● Not replication crash safe without GTIDs http://jfg-mysql.blogspot.com/2016/01/replication-crash-safety-with-mts.html ● And also be careful about these: binary logs content, SHOW SLAVE STATUS, skipping transactions, backups, … ● Using slave_preserve_commit_order = 1 does what you expect: ● This configuration does not generate gap ● But it needs log_slave_updates (feature request to remove this limitation: Bug#75396) ● Still it is not replication crash safe (surprising because no gap): Bug#80103 & Bug#81840 ● And it can hang if slave_transaction_retries is too low: Bug#89247 11
  • 12. MySQL // Replication Guts: Intervals ● In MySQL (5.7 and higher), each transaction is tagged with two (2) numbers: ● sequence_number: increasing id for each trx (not to confuse with GTID) ● last_committed: sequence_number of the latest trx on which this trx depends (This can be understood as the “write view” of the current transaction) ● The last_committed / sequence_number pair is the parallelism interval ● Here an example of intervals for MySQL 5.7: ... #170206 20:08:33 ... last_committed=6201 sequence_number=6203 #170206 20:08:33 ... last_committed=6203 sequence_number=6204 #170206 20:08:33 ... last_committed=6203 sequence_number=6205 #170206 20:08:33 ... last_committed=6203 sequence_number=6206 #170206 20:08:33 ... last_committed=6205 sequence_number=6207 ... 12
  • 13. MySQL 5.7 – Intervals Generation MySQL 5.7 leverages parallelism on the master to generate intervals: ● sequence_number is an increasing id for each trx (not GTID) (Reset to 1 at the beginning of each new binary log) ● last_committed is (in MySQL 5.7) the sequence number of the most recently committed transaction when the current transaction gets its last lock (Reset to 0 at the beginning of each new binary log) ... #170206 20:08:33 ... last_committed=6201 sequence_number=6203 #170206 20:08:33 ... last_committed=6203 sequence_number=6204 #170206 20:08:33 ... last_committed=6203 sequence_number=6205 #170206 20:08:33 ... last_committed=6203 sequence_number=6206 #170206 20:08:33 ... last_committed=6205 sequence_number=6207 ... 13
  • 14. MySQL – Intervals Quality ● For measuring parallelism identification quality with MySQL, we have a metric: the Average Modified Interval Length (AMIL) ● If we prefer to think in terms of group commit size, the AMIL can be mapped to a pseudo-group commit size by multiplying the AMIL by 2 and subtracting one ● For a group commit of size n, the sum of the intervals length is n*(n+1) / 2 #170206 20:08:33 ... last_committed=6203 sequence_number=6204 #170206 20:08:33 ... last_committed=6203 sequence_number=6205 #170206 20:08:33 ... last_committed=6203 sequence_number=6206 14
  • 15. MySQL – Intervals Quality ● For measuring parallelism identification quality with MySQL, we have a metric: the Average Modified Interval Length (AMIL) ● If we prefer to think in terms of group commit size, the AMIL can be mapped to a pseudo-group commit size by multiplying the AMIL by 2 and subtracting one ● For a group commit of size n, the sum of the intervals length is n*(n+1)/2  AMIL = (n+1)/2 (after dividing by n), algebra gives us n = AMIL * 2 - 1 ● This mapping could give a hint for slave_parallel_workers (http://jfg-mysql.blogspot.com/2017/02/metric-for-tuning-parallel-replication-mysql-5-7.html) 15
  • 16. MySQL – Intervals Quality’ ● Why do we need to “modify” the interval length ? ● Because of a limitation in the current MTS applier which will only start trx 93136 once 93131 is completed  last_committed=93124 is modified to 93131 #170206 21:19:31 ... last_committed=93124 sequence_number=93131 #170206 21:19:31 ... last_committed=93131 sequence_number=93132 #170206 21:19:31 ... last_committed=93131 sequence_number=93133 #170206 21:19:31 ... last_committed=93131 sequence_number=93134 #170206 21:19:31 ... last_committed=93131 sequence_number=93135 #170206 21:19:31 ... last_committed=93124 sequence_number=93136 #170206 21:19:31 ... last_committed=93131 sequence_number=93137 #170206 21:19:31 ... last_committed=93131 sequence_number=93138 #170206 21:19:31 ... last_committed=93132 sequence_number=93139 #170206 21:19:31 ... last_committed=93138 sequence_number=93140 16
  • 17. MySQL – Intervals Quality’’ ● Script to compute the Average Modified Interval Length: file=my_binlog_index_file; echo _first_binlog_to_analyse_ > $file; mysqlbinlog --stop-never -R --host 127.0.0.1 $(cat $file) | grep "^#" | grep -e last_committed -e "Rotate to" | awk -v file=$file -F "[ t]*|=" '$11 == "last_committed" { if (length($2) == 7) {$2 = "0" $2;} if ($12 < max) {$12 = max;} else {max = $12;} print $1, $2, $14 - $12;} $10 == "Rotate"{print $12 > file; close(file); max=0;}' | awk -F " |:" '{my_h = $2 ":" $3 ":" $4;} NR == 1 {d=$1; h=my_h; n=0; sum=0; sum2=0;} d != $1 || h < my_h {print d, h, n, sum, sum2; d=$1; h=my_h;} {n++; sum += $5; sum2 += $5 * $5;}' (https://jfg-mysql.blogspot.com/2017/02/metric-for-tuning-parallel-replication-mysql-5-7.html) 17
  • 18. MySQL – Intervals Quality’’’ ● Computing the AMIL needs parsing the binary logs ● This is complicated and needs to handle many special cases ● Exposing counters for computing the AMIL would be better: ● Bug#85965: Expose, on the master, counters for monitoring // information quality. ● Bug#85966: Expose, on slaves, counters for monitoring // information quality. (https://jfg-mysql.blogspot.com/2017/02/metric-for-tuning-parallel-replication-mysql-5-7.html) 18
  • 19. MySQL 5.7 – Tuning ● AMIL without and with tuning (delay) on four (4) Booking.com masters: (speed-up the slaves by increasing binlog_group_commit_sync_delay) 19
  • 20. MySQL 8.0 – Write Set ● MySQL 8.0.1 introduced a new way to identify parallelism ● Instead of setting last_committed to “the seq. number of the most recently committed transaction when the current trx gets its last lock”… ● MySQL 8.0.1 uses “the sequence number of the last transaction that updated the same rows as the current transaction” ● To do that, MySQL 8.0 remembers which rows (tuples) are modified by each transaction: this is the Write Set ● Write Set are not put in the binary logs, they allow to “widen” the intervals 20
  • 21. MySQL 8.0 – Write Set’ ● MySQL 8.0.1 introduces new global variables to control Write Set: ● transaction_write_set_extraction = [ OFF | XXHASH64 ] ● binlog_transaction_dependency_history_size (default to 25000) ● binlog_transaction_dependency_tracking = [ COMMIT_ORDER | WRITESET_SESSION | WRITESET ] ● WRITESET_SESSION: no two updates from the same session can be reordered ● WRITESET: any transactions which write different tuples can be parallelized ● WRITESET_SESSION will not work well for cnx recycling (Cnx Pools or Proxies): ● Recycling a connection with WRITESET_SESSION impedes parallelism identification ● Unless using the function reset_connection (with Bug#86063 fixed in 8.0.4) 21
  • 22. MySQL 8.0 – Write Set’’ ● To use Write Set on a Master: ● transaction_write_set_extraction = XXHASH64 ● binlog_transaction_dependency_tracking = [ WRITESET_SESSION | WRITESET ] (if WRITESET, slave_preserve_commit_order can avoid temporary inconsistencies) ● To use Write Set on an Intermediate Master (even single-threaded): ● transaction_write_set_extraction = XXHASH64 ● binlog_transaction_dependency_tracking = WRITESET (slave_preserve_commit_order can avoid temporary inconsistencies) ● To stop using Write Set: ● binlog_transaction_dependency_tracking = COMMIT_ORDER ● transaction_write_set_extraction = OFF 22
  • 23. MySQL 8.0 – Write Set’’’ ● Result for single-threaded Booking.com Intermediate Master (before and after): #170409 3:37:13 [...] last_committed=6695 sequence_number=6696 [...] #170409 3:37:14 [...] last_committed=6696 sequence_number=6697 [...] #170409 3:37:14 [...] last_committed=6697 sequence_number=6698 [...] #170409 3:37:14 [...] last_committed=6698 sequence_number=6699 [...] #170409 3:37:14 [...] last_committed=6699 sequence_number=6700 [...] #170409 3:37:14 [...] last_committed=6700 sequence_number=6701 [...] #170409 3:37:14 [...] last_committed=6700 sequence_number=6702 [...] #170409 3:37:14 [...] last_committed=6700 sequence_number=6703 [...] #170409 3:37:14 [...] last_committed=6700 sequence_number=6704 [...] #170409 3:37:14 [...] last_committed=6704 sequence_number=6705 [...] #170409 3:37:14 [...] last_committed=6700 sequence_number=6706 [...] 23
  • 24. MySQL 8.0 – Write Set’’’ ’ #170409 3:37:17 [...] last_committed=6700 sequence_number=6766 [...] #170409 3:37:17 [...] last_committed=6752 sequence_number=6767 [...] #170409 3:37:17 [...] last_committed=6753 sequence_number=6768 [...] #170409 3:37:17 [...] last_committed=6700 sequence_number=6769 [...] [...] #170409 3:37:18 [...] last_committed=6700 sequence_number=6783 [...] #170409 3:37:18 [...] last_committed=6768 sequence_number=6784 [...] #170409 3:37:18 [...] last_committed=6784 sequence_number=6785 [...] #170409 3:37:18 [...] last_committed=6785 sequence_number=6786 [...] #170409 3:37:18 [...] last_committed=6785 sequence_number=6787 [...] [...] #170409 3:37:22 [...] last_committed=6785 sequence_number=6860 [...] #170409 3:37:22 [...] last_committed=6842 sequence_number=6861 [...] #170409 3:37:22 [...] last_committed=6843 sequence_number=6862 [...] #170409 3:37:22 [...] last_committed=6785 sequence_number=6863
  • 25. MySQL 8.0 – AMIL of Write Set 25 ● AMIL on a single-threaded 8.0.1 Intermediate Master (IM) without/with Write Set:
  • 26. MySQL 8.0 – Write Set vs Delay ● AMIL on Booking.com masters with delay vs Write Set on Intermediate Master: 26
  • 27. MySQL 8.0 – Write Set’’’ ’’ ● Write Set advantages: ● No need to slowdown the master (maybe not true in all cases) ● Will work even at low concurrency on the master ● Allows to test without upgrading the master (works on an intermediate master) (however, this sacrifices session consistency, which might give optimistic results) ● Mitigate the problem of losing parallelism via intermediate masters (only with binlog_transaction_dependency_tracking = WRITESET) ( the best solution is still Binlog Servers) 27
  • 28. MySQL 8.0 – Write Set’’’ ’’’ ● Write Set limitations: ● Needs Row-Based-Replication on the master (or intermediate master) ● Not working for trx updating tables without PK and trx updating tables having FK (it will fall back to COMMIT_ORDER for those transactions) ● Barrier at each DDL (Bug#86060 for adding counters) ● Barrier at each binary log rotation: no transactions in different binlogs can be run in // ● With WRITESET_SESSION, does not play well with connection recycling (Could use COM_RESET_CONNECTION if Bug#86063 is fixed) ● Write Set drawbacks: ● Slowdown the master ? Consume more RAM ? ● New technology: not fully mastered yet and there are bugs (still 1st DMR release) 28
  • 29. MySQL 8.0 – Write Set @ B.com ● Tests on eight (8) real Booking.com environments (different workloads): ● A is MySQL 5.6 and 5.7 masters (1 and 7), some are SBR (4) some are RBR (4) ● B is MySQL 8.0.3 Intermediate Master with Write Set (RBR) set global transaction_write_set_extraction = XXHASH64; set global binlog_transaction_dependency_tracking = WRITESET; ● C is a slave with local SSD storage +---+ +---+ +---+ | A | -------> | B | -------> | C | +---+ +---+ +---+ ● Run with 0, 2, 4, 8, 16, 32, 64, 128 and 256 workers, with High Durability (HD - sync_binlog = 1 & trx commit = 1) and No Durability (ND – 0 & 2), without and with slave_preserve_commit_order (NO and WO) with and without log_slave_updates (IM and SB)
  • 30. MySQL 8.0 – Write Set Speedups 30 E5 IM-HD Single-Threaded: 6138 seconds E5 IM-ND Single-Threaded: 2238 seconds
  • 31. MySQL 8.0 – Write Set Speedups’ 31
  • 32. MySQL 8.0 – Write Set Speedups’’ 32
  • 33. MySQL 8.0 – Write Set Speedups’’’ 33
  • 34. MySQL 8.0 – Speedup Summary ● No thrashing when too many threads ! ● For the environments with High Durability: ● Two (2) “interesting” speedups: 1.6, 1.7 ● One (1) good: 2.7 ● Four (4) very good speedups: 4.4, 4.5, 5.6, and 5.8 ● One (1) great speedups: 10.8 ! ● For the environments without durability (ND): ● Three (3) good speedups: 1.3, 1.5 and 1.8 ● Three (3) very good speedups: 2.4, 2.8 and 2.9 ● Two (2) great speedups: 3.7 and 4.8 ! ● All that without tuning MySQL or the application
  • 35. MySQL 8.0 – Looking at low speedups 35
  • 36. MySQL 8.0 – Looking at low speedups’ 36
  • 37. MySQL 8.0 – Looking at good speedups 37
  • 38. MySQL 8.0 – Looking at good speedups’ 38
  • 39. MySQL 8.0 – Looking at great speedups 39
  • 40. How do we tune slave_parallel_workers? Goal of tuning: find a value that gives good speedup without wasting too much resources We can use: - Commit rate - Replication catch-up speed But these are workload - dependent, making optimization harder: - Workload variations over time - Hardware differences
  • 41. Wouldn’t it be cool to have a workload - independent metric to help optimize slave_parallel_workers? Such a metric would: ● Have a good value when all of your applier threads are doing work ● Be sensitive to having idle threads. Sounds like a resource allocation fairness problem...
  • 42. In a fair world... ● slave_parallel_workers = n ● Each worker performs an equal amount of “work” Say n = 5
  • 43. Stuff that would not be good: ● No actual parallelism
  • 44. In an unfair world... ● Poor parallelism
  • 45. How do we define “work”? Inspired by Percona blog post by Stephane Combaudon (https://www.percona.com/blog/2016/02/10/estimating-potential-for-mysql-5-7-parallel-replication/) ● Percentage of commits applied per worker thread Easy to measure from performance schema: USE performance_schema; -- SET UP PS instrumentation UPDATE setup_consumers SET enabled = 'YES' WHERE NAME LIKE 'events_transactions%'; UPDATE setup_instruments SET enabled = 'YES', timed = 'YES' WHERE NAME = 'transaction'; -- Get information on commits per applier thread SELECT T.thread_id, T.count_star AS COUNT_STAR FROM events_transactions_summary_by_thread_by_event_name T WHERE T.thread_id IN (SELECT thread_id FROM replication_applier_status_by_worker);
  • 46. What does it give? > SELECT [...]; +-----------+------------+ | thread_id | COUNT_STAR | +-----------+------------+ | 63581 | 394 | | 63582 | 292 | | 63583 | 272 | | 63584 | 260 | | 63585 | 251 | | 63586 | 237 | [...] | 63606 | 108 | | 63607 | 106 | | 63608 | 104 | | 63609 | 100 | | 63610 | 98 | | 63611 | 94 | | 63612 | 89 | +-----------+------------+ 32 rows in set (0.00 sec) > SELECT [...]; +-----------+------------+ | thread_id | COUNT_STAR | +-----------+------------+ | 63627 | 749 | | 63628 | 504 | | 63629 | 457 | | 63630 | 433 | | 63631 | 405 | | 63632 | 384 | [...] | 63684 | 73 | | 63685 | 73 | | 63686 | 71 | | 63687 | 68 | | 63688 | 65 | | 63689 | 65 | | 63690 | 62 | +-----------+------------+ 64 rows in set (0.00 sec) > SELECT [...]; +-----------+------------+ | thread_id | COUNT_STAR | +-----------+------------+ | 63704 | 711 | | 63705 | 468 | | 63706 | 420 | | 63707 | 378 | | 63708 | 359 | | 63709 | 340 | [...] | 63825 | 3 | | 63826 | 3 | | 63827 | 3 | | 63828 | 3 | | 63829 | 3 | | 63830 | 2 | | 63831 | 2 | +-----------+------------+ 128 rows in set (0.00 sec)
  • 47. What does it give? > SELECT [...]; +-----------+------------+ | thread_id | COUNT_STAR | +-----------+------------+ | 63581 | 394 | | 63582 | 292 | | 63583 | 272 | | 63584 | 260 | | 63585 | 251 | | 63586 | 237 | [...] | 63606 | 108 | | 63607 | 106 | | 63608 | 104 | | 63609 | 100 | | 63610 | 98 | | 63611 | 94 | | 63612 | 89 | +-----------+------------+ 32 rows in set (0.00 sec) > SELECT [...]; +-----------+------------+ | thread_id | COUNT_STAR | +-----------+------------+ | 63627 | 749 | | 63628 | 504 | | 63629 | 457 | | 63630 | 433 | | 63631 | 405 | | 63632 | 384 | [...] | 63684 | 73 | | 63685 | 73 | | 63686 | 71 | | 63687 | 68 | | 63688 | 65 | | 63689 | 65 | | 63690 | 62 | +-----------+------------+ 64 rows in set (0.00 sec) > SELECT [...]; +-----------+------------+ | thread_id | COUNT_STAR | +-----------+------------+ | 63704 | 711 | | 63705 | 468 | | 63706 | 420 | | 63707 | 378 | | 63708 | 359 | | 63709 | 340 | [...] | 63825 | 3 | | 63826 | 3 | | 63827 | 3 | | 63828 | 3 | | 63829 | 3 | | 63830 | 2 | | 63831 | 2 | +-----------+------------+ 128 rows in set (0.00 sec)
  • 48. How can we understand those numbers: Jain’s index Jain’s fairness index: ● 𝑛: number of applier threads. ● 𝑖: the i-th applier thread. ● 𝑥𝑖 :amount of work performed by thread > SELECT [...]; +-----------+------------+ | thread_id | COUNT_STAR | +-----------+------------+ | 63627 | 749 | | 63628 | 504 | | 63629 | 457 | | 63630 | 433 | | 63631 | 405 | | 63632 | 384 | [...] | 63684 | 73 | | 63685 | 73 | | 63686 | 71 | | 63687 | 68 | | 63688 | 65 | | 63689 | 65 | | 63690 | 62 | +-----------+------------+ 64 rows in set (0.00 sec)
  • 49. What does Jain’s index mean? - 0 ≤ J ≤ 1 - For n = 1  J = 1 (but no parallelism) - If n > 1 and J ≅ 1, all workers are doing similar work - If n > 1 and J < 1, some workers are doing less work - If n > 1 and J << 1, some workers are idle Avoid J ≅ 1 And avoid J << 1 0 1
  • 50. Putting the idea to the test 5.7 Master 5.7 replica 8.0 Intermediate master 8.0 replicaGroup commit sync delay = 100 ms Group commit sync no delay count=50 Writeset enabled Replication stopped for 24 h Applier threads: 1 ≤ n ≤ 128 Replication stopped for 24 h Applier threads: 1 ≤ n ≤ 128
  • 51. First, something familiar to compare with: Optimal values for n: • For 5.7, n = 8 or 16 • For 8.0, n = 32 or 64 Higher values of n would be wasting resources on applier threads that have no work to do. 0 1000 2000 3000 4000 5000 6000 0 20 40 60 80 100 120 140 Commit rate vs n C / s (5.7) C / s (8.0)
  • 52. What does Jain´s index have to say? 0 0,2 0,4 0,6 0,8 1 1,2 0 20 40 60 80 100 120 140 J vs n J (5.7) J (8.0)
  • 53. Caveats... ● What happens when not catching up but just keeping up with current replication traffic? Not enough work means idle threads  Optimal number of workers depends on catching up or keeping up
  • 54. Another way to look at the numbers? > SELECT [...]; +-----------+------------+ | thread_id | COUNT_STAR | +-----------+------------+ | 63581 | 394 | | 63582 | 292 | | 63583 | 272 | | 63584 | 260 | | 63585 | 251 | | 63586 | 237 | [...] | 63606 | 108 | | 63607 | 106 | | 63608 | 104 | | 63609 | 100 | | 63610 | 98 | | 63611 | 94 | | 63612 | 89 | +-----------+------------+ 32 rows in set (0.00 sec) > SELECT [...]; +-----------+------------+ | thread_id | COUNT_STAR | +-----------+------------+ | 63627 | 749 | | 63628 | 504 | | 63629 | 457 | | 63630 | 433 | | 63631 | 405 | | 63632 | 384 | [...] | 63684 | 73 | | 63685 | 73 | | 63686 | 71 | | 63687 | 68 | | 63688 | 65 | | 63689 | 65 | | 63690 | 62 | +-----------+------------+ 64 rows in set (0.00 sec) > SELECT [...]; +-----------+------------+ | thread_id | COUNT_STAR | +-----------+------------+ | 63704 | 711 | | 63705 | 468 | | 63706 | 420 | | 63707 | 378 | | 63708 | 359 | | 63709 | 340 | [...] | 63825 | 3 | | 63826 | 3 | | 63827 | 3 | | 63828 | 3 | | 63829 | 3 | | 63830 | 2 | | 63831 | 2 | +-----------+------------+ 128 rows in set (0.00 sec)
  • 55. MySQL 8.0 – What about commit order ? 55
  • 56. MySQL 8.0 – What about commit order ?’ 56
  • 57. MySQL 5.7 and 8.0 // Repl. Summary ● Parallel replication in MySQL 5.7 is not simple: ● Need precise tuning ● Long transactions block the parallel replication pipeline ● Care about Intermediate masters ● Write Set in MySQL 8.0 gives very interesting results: ● No problem with Intermediate masters ● Allows to test with Intermediate Master ● Some great speedups and most of them very good
  • 58. And please test by yourself and share results
  • 59. // Replication: Links ● Replication crash safety with MTS in MySQL 5.6 and 5.7: reality or illusion? https://jfg-mysql.blogspot.com/2016/01/replication-crash-safety-with-mts.html ● A Metric for Tuning Parallel Replication in MySQL 5.7 https://jfg-mysql.blogspot.com/2017/02/metric-for-tuning-parallel-replication-mysql-5-7.html ● Solving MySQL Replication Lag with LOGICAL_CLOCK and Calibrated Delay https://www.vividcortex.com/blog/solving-mysql-replication-lag-with-logical_clock-and-calibrated-delay ● How to Fix a Lagging MySQL Replication https://thoughts.t37.net/fixing-a-very-lagging-mysql-replication-db6eb5a6e15d ● Binlog Servers: ● http://blog.booking.com/mysql_slave_scaling_and_more.html ● Better Parallel Replication for MySQL: http://blog.booking.com/better_parallel_replication_for_mysql.html ● http://blog.booking.com/abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfiguring_slaves.html 59
  • 60. // Replication: Links’ ● An update on Write Set (parallel replication) bug fix in MySQL 8.0 https://jfg-mysql.blogspot.com/2018/01/an-update-on-write-set-parallel-replication-bug-fix-in-mysql-8-0.html ● Write Set in MySQL 5.7: Group Replication https://jfg-mysql.blogspot.com/2018/01/write-set-in-mysql-5-7-group-replication.html ● More Write Set in MySQL: Group Replication Certification https://jfg-mysql.blogspot.com/2018/01/more-write-set-in-mysql-5-7-group-replication-certification.html 60
  • 61. // Replication: Links’’ ● Bugs/feature requests: ● The doc. of slave-parallel-type=LOGICAL_CLOCK wrongly reference Group Commit: Bug#85977 ● Allow slave_preserve_commit_order without log-slave-updates: Bug#75396 ● MTS with slave_preserve_commit_order not repl. crash safe: Bug#80103 ● Automatic Repl. Recovery Does Not Handle Lost Relay Log Events: Bug#81840 ● Expose, on the master/slave, counters for monitoring // info. quality: Bug#85965 & Bug#85966 ● Expose counters for monitoring Write Set barriers: Bug#86060 ● Deadlock with slave_preserve_commit_order=ON with Bug#86078: Bug#86079 & Bug#89247 ● Fixed bugs: ● Message after MTS crash misleading: Bug#80102 (and Bug#77496) ● Replication position lost after crash on MTS configured slave: Bug#77496 ● Full table scan bug in InnoDB: MDEV-10649, Bug#82968 and Bug#82969 ● The function reset_connection does not reset Write Set in WRITESET_SESSION: Bug#86063 ● Bad Write Set tracking with UNIQUE KEY on a DELETE followed by an INSERT: Bug#86078
  • 62. Thanks Eduardo Ortega (MySQL Database Engineer) eduardo DOT ortega AT booking.com Jean-François Gagné (System Engineer) jeanfrancois DOT gagne AT booking.com