Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Alibaba Patches in MariaDB
Lixun Peng
Topic
• Time Machine / Flashback (Developing)
• Double Sync Replication (Will Contribute)
• Multi-Source Replication
• Thr...
What’s a Time Machine
• Rolling back instances/databases/tables to a snapshot
• Implement on Server-Level to support all e...
Why Time Machine
• Everyone may make mistakes, including a DBA.
• After users mis-operating their data, of course, we can
...
How Time Machine Works
• As we know, if binlog_format is ROW (binlog-row-
image=FULL in 5.6 and later), all columns’ value...
Done List
• Full DML support
• Review table support
• Because users may want to check which part of data is flashbacked.
•...
ToDo List
• Adding DDL supports
• For ADD INDEX/COLUMN, or CREATE TABLE query, just drop the
index, column, table when run...
Flashback command
Double Sync Replication
——Enhancing data security guarantee
Lixun Peng @ Alibaba Cloud Compute
Problem of Async Replication
• Master don’t need to wait the ACK from Slave.
• Slave doesn’t know if it dumped the latest ...
Semi-Sync Replication
Problem of SemiSync
• Master needs to wait ACK from Slave.
• Slave will downgrade to Async when timeout happen.
• If the t...
Problem of Async/SemiSync
Backgroup & Target
• Backgroup
• SA guarantee the server availability: 99.999%
• NA guarantee the network availability: 99...
Solve the weak point of SemiSync
• Once SemiSync is timeout, even network is recovered, Slave
still need to dump the binar...
Combine the Async and SemiSync
• Async Replication(Async_Channel)
• Dumping continuous binary logs to guarantee that the S...
Combine the Async and SemiSync
How to create two channels(1)
• Multi-Source replication can create N channels in one Slave.
• Problem:When Master receive...
How to create two channels (2)
• Problem:There are a SemiSync and a non-SemiSync Channel
in one Slave, but the SemiSync se...
Analyzing consistency
• Using the GTID
• Using the Log_file_name and Log_file_pos
• How to judge, check the following pict...
Analyzing consistency
CASE 1: Needn’t Fix
• GTIDs between Sync and Async Channel are the same.
CASE 2: Can’t Fix
• Exist broken gap between Sync and Async Channel.
CASE 3: Can Repair
• Combine two channel’s logs, it’s continuous.
How to Repair
• We wait for the Async Channel till it applied for all logs that
received. Then start the SQL THREAD of Syn...
Multi-Source Replication
——N Masters and 1 Slave
Lixun Peng @ Alibaba Cloud Compute
Why we need multi-source
• OLAP
• Most of users using MySQL for data sharding.
• Multi-Source can help users to combine th...
How Multi-Source implement
What changes in the code
• Move Rpl_filter/skip_slave_counters into Master_info.
• Every channels will create a new Master...
The Syntax
• CHANGE MASTER ["connection_name"] ...
• FLUSH RELAY LOGS ["connection_name"]
• MASTER_POS_WAIT(....,["connect...
The Syntax
• set @@default_master_connection='';
• show status like 'Slave_running';
• set @@default_master_connection=‘co...
How it runs
Thread Memory Monitor
——Known how MySQL using memory
Lixun Peng @ Alibaba Cloud Compute
Why we need TMM
• MySQL’s memory limitation just work fine on Storage Engine
• For example in InnoDB: innodb_buffer_pool_s...
How to solve it
• Add a hack in my_malloc.
• Record the malloc size and which thread applied for this
memory
• Calculate a...
THANKS!
Prochain SlideShare
Chargement dans…5
×

Alibaba patches in MariaDB

12 222 vues

Publié le

  • The #1 Woodworking Resource With Over 16,000 Plans, Download 50 FREE Plans... ➤➤ https://url.cn/ktFCrsHZ
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/2ZDZFYj ❤❤❤
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Dating for everyone is here: ❤❤❤ http://bit.ly/2ZDZFYj ❤❤❤
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Soyez le premier à aimer ceci

Alibaba patches in MariaDB

  1. 1. Alibaba Patches in MariaDB Lixun Peng
  2. 2. Topic • Time Machine / Flashback (Developing) • Double Sync Replication (Will Contribute) • Multi-Source Replication • Thread Memory Monitor
  3. 3. What’s a Time Machine • Rolling back instances/databases/tables to a snapshot • Implement on Server-Level to support all engines. • By full image format binary logs • Currently, it’s a feature of mysqlbinlog tool (with--flashback option)
  4. 4. Why Time Machine • Everyone may make mistakes, including a DBA. • After users mis-operating their data, of course, we can recovery it from the last full backup set and binary logs. • But if users’ database is too huge, it will cost so much time! And usually, mis-operation just modify a few data, but we need to recovery whole database.
  5. 5. How Time Machine Works • As we know, if binlog_format is ROW (binlog-row- image=FULL in 5.6 and later), all columns’ values are store in the row event, so we can get the data before mis- operation. • Just do following things: • Change Event Type, INSERT->DELETE, DELETE->INSERT • For Update_Event, swapping the SET part and WHERE part • Applying those events from the last one to the first one which mis-operation happened. • All the data will be recovered by inverse operations of mis- oprerations.
  6. 6. Done List • Full DML support • Review table support • Because users may want to check which part of data is flashbacked. • GTID support (MariaDB) • We add GTID event support for MariaDB 10.1 • MySQL 5.6 GTID events support is still working
  7. 7. ToDo List • Adding DDL supports • For ADD INDEX/COLUMN, or CREATE TABLE query, just drop the index, column, table when running Flashback. • For DROP INDEX/COLUMN, or DROP TABLE query, copy or rename the old table to a reserved database. When Flashback is running, I can drop the new table, and rename the saved old one to the original database. • For TRUNCATE table, I just rename the old table to a reserved database and create a new empty table. • Adding a script for time machine.
  8. 8. Flashback command
  9. 9. Double Sync Replication ——Enhancing data security guarantee Lixun Peng @ Alibaba Cloud Compute
  10. 10. Problem of Async Replication • Master don’t need to wait the ACK from Slave. • Slave doesn’t know if it dumped the latest binary logs from Master. • When crashed, slave can check if itself is the same with Master or not by its own. • So,The main problem is that Slave doesn’t know the status of Master.
  11. 11. Semi-Sync Replication
  12. 12. Problem of SemiSync • Master needs to wait ACK from Slave. • Slave will downgrade to Async when timeout happen. • If the timeout is too small, timeout will happen frequently. • If the timeout is too big, Master will often be blocked. • After network is recovered, Slave should dump the binary logs generated during timeout. During the time, Slave is still Async. • When a Master is crashed, Slave doesn’t know if the master is Async or SemiSync. • So, Slave still doesn’t know if it’s the same with Master or not when Master crashed. • So,SemiSync doesn’t solve the main problem of Async Repplication.
  13. 13. Problem of Async/SemiSync
  14. 14. Backgroup & Target • Backgroup • SA guarantee the server availability: 99.999% • NA guarantee the network availability: 99.999% • So, we can assume when the Master is crashed, network will not timeout at that time point. • Target • Slave can know its status by itself. (the same with Master or not) • If the data isn’t the same with Master, notice the app&dev to fix the data, and show the range of lost data. • Key Point: To avoid Slave's status being unknown!
  15. 15. Solve the weak point of SemiSync • Once SemiSync is timeout, even network is recovered, Slave still need to dump the binary logs generated during timeout, under Async. • If SemiSync is timeout, we give up the binary logs during timeout, Master just send the latest position & logs. What will happen? • When the network is down, the Slave will always know the latest position on Master. • So, Slave can know if its data is the same with Master or not. • But, if Slave just dump the latest data, how to get the data during the time when network is down? • Async replication can dump the continuous binaray logs • So we can use Async replication to do the full log apply.
  16. 16. Combine the Async and SemiSync • Async Replication(Async_Channel) • Dumping continuous binary logs to guarantee that the Slave’s logs are continuous. • Applying for logs after received immediately. • SemiSync Replication(Sync_Channel) • Dumping the latest binary logs to guarantee that the Slave knows the latest position of Master. • Will not apply logs after received, just save the logs & position and outdated logs will be purged automatically. • Analyzing consistency • Comparing the received logs positions with these two channels.
  17. 17. Combine the Async and SemiSync
  18. 18. How to create two channels(1) • Multi-Source replication can create N channels in one Slave. • Problem:When Master received two dump requests from the same Server-ID servers, it will disconnect the previous one. • Solve:We set Sync Channel as a special Server-ID (0xFFFFFF).
  19. 19. How to create two channels (2) • Problem:There are a SemiSync and a non-SemiSync Channel in one Slave, but the SemiSync settings are global. • Solve:We moved SemiSyncSlave class to Master_info.
  20. 20. Analyzing consistency • Using the GTID • Using the Log_file_name and Log_file_pos • How to judge, check the following pictures 
  21. 21. Analyzing consistency
  22. 22. CASE 1: Needn’t Fix • GTIDs between Sync and Async Channel are the same.
  23. 23. CASE 2: Can’t Fix • Exist broken gap between Sync and Async Channel.
  24. 24. CASE 3: Can Repair • Combine two channel’s logs, it’s continuous.
  25. 25. How to Repair • We wait for the Async Channel till it applied for all logs that received. Then start the SQL THREAD of Sync Channel. • GTID will filter the event that applied by Async Channel. • We provide the REPAIR SLAVE command to do these things automaticially.
  26. 26. Multi-Source Replication ——N Masters and 1 Slave Lixun Peng @ Alibaba Cloud Compute
  27. 27. Why we need multi-source • OLAP • Most of users using MySQL for data sharding. • Multi-Source can help users to combine their data from sharding instances. • If you are using Master-Slave for backup, Multi-Source can help you to backup many instances into one, it’s easy to maintain.
  28. 28. How Multi-Source implement
  29. 29. What changes in the code • Move Rpl_filter/skip_slave_counters into Master_info. • Every channels will create a new Master_info. • Every replication-related function will use the special Maser_info. • We create a Master_info_index class to maintain all Master_info.
  30. 30. The Syntax • CHANGE MASTER ["connection_name"] ... • FLUSH RELAY LOGS ["connection_name"] • MASTER_POS_WAIT(....,["connection_name"]) • RESET SLAVE ["connection_name"] • SHOW RELAYLOG ["connection_name"] EVENTS • SHOW SLAVE ["connection_name"] STATUS • SHOW ALL SLAVES STATUS • START SLAVE ["connection_name"...] • START ALL SLAVES ... • STOP SLAVE ["connection_name"] ... • STOP ALL SLAVES ...
  31. 31. The Syntax • set @@default_master_connection=''; • show status like 'Slave_running'; • set @@default_master_connection=‘connection'; • show status like 'Slave_running';
  32. 32. How it runs
  33. 33. Thread Memory Monitor ——Known how MySQL using memory Lixun Peng @ Alibaba Cloud Compute
  34. 34. Why we need TMM • MySQL’s memory limitation just work fine on Storage Engine • For example in InnoDB: innodb_buffer_pool_size • In the Server we can limit only some features’ memory, like sort_buffer_size, join_buffer_size. • But for big Query,the most of memory cost is from MEM_ROOT,no option to limit it. • So when mysqld process used too many memory, we don’t know which thread is the reason. • Then we don’t know which thread to kill to release the memory.
  35. 35. How to solve it • Add a hack in my_malloc. • Record the malloc size and which thread applied for this memory • Calculate a total memory size of all threads.
  36. 36. THANKS!

×