SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
MariaDB Maxscale
Switchover, Failover and Rejoin
Wagner Bianchi
Remote DBA Team Lead @ MariaDB RDBA Team
Esa Korhonen
Software Engineer @ MariaDB Maxscale Engineering Team
Introduction to MariaDB MaxScale
● Intelligent database proxy:
○ Separates client application
from backend(s)
○ Understands authentication,
queries and backend roles
○ Typical use-cases: read-write
splitting, load-balancing
○ Many plugins: query filtering,
logging, caching
● Latest GA version: 2.2
DATABASE
SERVERS
CLIENT
Query processing stages
Filter
Client
Protocol
Protocol
Filter Filter Router
Server State
Monitor
Parser updates
monitors
uses
Backend
What is new in MariaDB-Monitor for MaxScale 2.2*
● Support for replication cluster manipulation: failover, switchover, rejoin
○ failover: replace a failed master with a slave
○ switchover: swap a slave with a live master
○ rejoin: bring a standalone server back to the cluster or redirect slaves replicating from the
wrong master
● Failover & rejoin can be set to activate automatically
● Reduces need for custom scripts or replication management tools
● Supported topologies: 1 Master, N slaves, 1-level depth
● Limited support for external masters
* Note: Renamed from previous mysqlmon
Switchover
● Controlled swap of master with a
designated slave
● Monitor user must have SUPER-privilege
● Depends on read_only to freeze cluster
○ SUPER-users bypasses this
● Waits for all slaves to catch up with
master
○ no data should be lost, but can be slow
● Configuration settings:
○ replication_user & replication_password
○ switchover_timeout
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
$./maxctrl call command mariadbmon switchover MariaDB-Monitor LocalSlave1
OK
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Slave, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
Failover
● Promote a slave to take place of failed
master
● Damage has already been done, so no
need to worry about old master
● Chooses a new master based on following
criteria (in order of importance):
○ not in exclusion-list
○ has latest event in relay log
○ has processed latest event
○ has log_slave_updates on
● Configuration:
○ failover_timeout
● May lose data with failed master
○ (semi)sync replication
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │
├──────────────┼───────────┼──────┼─────────────┼────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │
├──────────────┼───────────┼──────┼─────────────┼────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴────────────────┘
$./maxctrl call command mariadbmon failover MariaDB-Monitor
OK
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
Automatic failover
● Trigger: master must be down for a
set amount of time
● Additional check by looking at slave
connections
● Configuration settings:
○ auto_failover
○ failcount & monitor_interval
○ verify_master_failure &
master_failure_timeout
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
$docker stop maxscalebackends_testing1_master1_1
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │
├──────────────┼───────────┼──────┼─────────────┼────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │
├──────────────┼───────────┼──────┼─────────────┼────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴────────────────┘
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
Rejoin
● Directs the joining to server to replicate from
the cluster master
○ redirect a slave replicating from the wrong master
○ start replication on a standalone server
● Looks at gtid:s to decide if the joining server can
replicate
● Manual/automatic mode (auto_rejoin=1)
● Typical use case: master goes down -> failover
-> old master comes back -> rejoined to cluster
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
$docker start maxscalebackends_testing1_master1_1
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
$./maxctrl call command mariadbmon rejoin MariaDB-Monitor LocalMaster1
$./maxctrl list servers
┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐
│ Server │ Address │ Port │ Connections │ State │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Slave, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │
├──────────────┼───────────┼──────┼─────────────┼─────────────────┤
│ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │
└──────────────┴───────────┴──────┴─────────────┴─────────────────┘
External master handling
DC A DC B
replicating from
DC A DC B
replicating from
Switchover details
Starting checks:
1. Cluster has 1 master and >1 slaves
2. All servers use GTID replication and cluster
GTID-domain is known
3. Requested new master has binary log on
Prepare current master:
1. SET GLOBAL read_only=1;
2. FLUSH TABLES;
3. FLUSH LOGS;
4. update GTID-info
Wait until all slaves catch up to
master:
1. MASTER_GTID_WAIT()
A
B
C
A
B
C
Stop slave replication on new
master:
1. STOP SLAVE;
2. RESET SLAVE ALL;
3. SET GLOBAL read_only=0
B
A
C
Redirect slaves & old master to
new master:
1. STOP SLAVE;
2. RESET SLAVE;
3. CHANGE MASTER TO …
4. START SLAVE;
Check that replication is working:
1. FLUSH TABLES;
2. Check that all slaves
receive new gtid
DEMO TIME!!
Maxscale 2.2 New Features
● At this point you know that, MariaDB Maxscale is able to:
○ Automatic/Manual Failover;
○ Manual Switchover;
○ Rejoin a crashed node as slave of an existing cluster;
● The previous processes relies on the new MariaDBMon monitor;
● Hidden details when implementing and/or break/fix:
○ For the switchover/failover/rejoin work, you need to have the monitor user (MariaDBMon) with
access on all the servers or, a separate user for replication_user and replication_password
with access on all the servers;
○ If the monitor user (MariaDBMon) has an encrypted password, the replication_password
should be encrypted as well, otherwise, the CHANGE MASTER TO running for the processes
won't be able to configure the replication for the new server;
Maxscale 2.2 New Features
● Failover: replacing a failed master.
● For the automatic failover, auto_failover variable should be true on monitor
configuration definition;
○ auto_failover=true, for automatic failover be activated;
● For the manual failover, auto_failover should be set to false on monitor
configuration definition;
● The master should be dead for the manual failover to work;
○ auto_failover=false, the failover can be activated manually:
● Enable and disable to auto_failover with the alter monitor command.
[root@box01 ~]# maxadmin call command mariadbmon failover replication-cluster-monitor
Maxscale 2.2 New Features
● Failover: replacing a failed master (automatic, auto_failover=true)
#: checking current configurations
[root@box01 ~]# grep auto_failover /var/lib/maxscale/maxscale.cnf.d/replication-cluster-monitor.cnf
auto_failover=true
#: shutdown the current master - check the current topology out of `maxadmin list servers` for better confirming it
[root@box02 ~]# systemctl stop mariadb.service
#: watching the actions on the log file
2018-02-10 13:51:02 error : Monitor was unable to connect to server [192.168.50.13]:3306 : "Can't connect to MySQL server on '192.168.50.13'"
2018-02-10 13:51:02 notice : [mariadbmon] Server [192.168.50.13]:3306 lost the master status.
2018-02-10 13:51:02 notice : Server changed state: box03[192.168.50.13:3306]: master_down. [Master, Running] -> [Down]
2018-02-10 13:51:02 warning: [mariadbmon] Master has failed. If master status does not change in 4 monitor passes, failover begins.
2018-02-10 13:51:06 notice : [mariadbmon] Performing automatic failover to replace failed master 'box03'.
2018-02-10 13:51:06 notice : [mariadbmon] Promoting server 'box02' to master.
2018-02-10 13:51:06 notice : [mariadbmon] Redirecting slaves to new master.
2018-02-10 13:51:07 warning: [mariadbmon] Setting standalone master, server 'box02' is now the master.
2018-02-10 13:51:07 notice : Server changed state: box02[192.168.50.12:3306]: new_master. [Slave, Running] -> [Master, Running]
Maxscale 2.2 New Features
● Failover: replacing a failed master (manual, auto_failover=false)
#: setting auto_fauilover=false
[root@box01 ~]# maxadmin alter monitor replication-cluster-monitor auto_failover=false
#: current master is down, automatic failover deactivated
2018-02-09 23:31:01 error : Monitor was unable to connect to server [192.168.50.12]:3306:"Can't connect to MySQL server on '192.168.50.12'"
2018-02-09 23:31:01 notice : [mariadbmon] Server [192.168.50.12]:3306 lost the master status.
2018-02-09 23:31:01 notice : Server changed state: box02[192.168.50.12:3306]: master_down. [Master, Running] -> [Down]
#: manual failover executed
[root@box01 ~]# maxadmin call command mariadbmon failover replication-cluster-monitor
#: let's check the logs
2018-02-09 23:32:30 info : (17) [cli] MaxAdmin: call command "mariadbmon" "failover" "replication-cluster-monitor"
2018-02-09 23:32:30 notice : (17) [mariadbmon] Stopped monitor replication-cluster-monitor for the duration of failover.
2018-02-09 23:32:30 notice : (17) [mariadbmon] Promoting server 'box03' to master.
2018-02-09 23:32:30 notice : (17) [mariadbmon] Redirecting slaves to new master.
2018-02-09 23:32:30 notice : (17) [mariadbmon] Failover performed.
2018-02-09 23:32:30 warning: [mariadbmon] Setting standalone master, server 'box03' is now the master.
2018-02-09 23:32:30 notice : Server changed state: box03[192.168.50.13:3306]: new_master. [Slave, Running] -> [Master, Running]
Maxscale 2.2 New Features
● Failover: replacing a failed master, additional details
● The passes time is based on the monitor's monitor_interval value;
○ As it's now set as 1000ms, 1 second, the failover will be triggered after 4 seconds, considering
the first pass done when monitor reported the first message;
○ If the failover process does not complete within the time configured on failover_timeout, it is 90
secs by default, the failover is canceled and the feature is disabled;
○ To enable failover again (after checking the possible problems), use the alter monitor cmd:
2018-02-10 13:51:02 warning: [mariadbmon] Master has failed.If master status does not change in 4 monitor passes, failover begins.
[root@box01 ~]# maxadmin alter monitor replication-cluster-monitor auto_failover=true
Maxscale 2.2 New Features
● Switchover: swapping a slave with a running master.
● The switchover process relies on the replication_user and
replication_password setting added to the monitor configs;
● The process is triggered manually and it should take up to
switchover_timeout seconds to complete - default 90 seconds;
● If the process fails, the log will be written and the auto_failover will be
disabled if enabled;
[root@team01-box01 ~]# maxadmin call command mariadbmon switchover replication-cluster-monitor new_master master
Maxscale 2.2 New Features
#: checking the current server's list
[root@team01-box01 ~]# maxadmin list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+--------------------
box02 | 10.132.116.147 | 3306 | 0 | Slave, Running
box03 | 10.132.116.161 | 3306 | 0 | Master, Running
-------------------+-----------------+-------+-------------+--------------------
#: new_master=box03, current_master=box02
[root@team01-box01 ~]# maxadmin call command mariadbmon switchover replication-cluster-monitor box03 box02
#: checking logs
2018-02-14 16:44:46 info : (712) [cli] MaxAdmin: call command "mariadbmon" "switchover" "replication-cluster-monitor" "box02" "box03"
2018-02-14 16:44:46 notice : (712) [mariadbmon] Stopped the monitor replication-cluster-monitor for the duration of switchover.
2018-02-14 16:44:46 notice : (712) [mariadbmon] Demoting server 'box03'.
2018-02-14 16:44:46 notice : (712) [mariadbmon] Promoting server 'box02' to master.
2018-02-14 16:44:46 notice : (712) [mariadbmon] Old master 'box03' starting replication from 'box02'.
2018-02-14 16:44:46 notice : (712) [mariadbmon] Redirecting slaves to new master.
2018-02-14 16:44:47 notice : (712) [mariadbmon] Switchover box03 -> box02 performed.
2018-02-14 16:44:47 notice : Server changed state: box02[10.132.116.147:3306]: new_master. [Slave, Running] -> [Master, Slave, Running]
2018-02-14 16:44:47 notice : Server changed state: box03[10.132.116.161:3306]: new_slave. [Master, Running] -> [Slave, Running]
2018-02-14 16:44:48 notice : Server changed state: box02[10.132.116.147:3306]: new_master. [Master, Slave, Running] -> [Master, Running]
Switchover: swapping a slave with a running master.
Maxscale 2.2 New Features
● Rejoin: joining a standalone server to the cluster.
● Enable automatic joining back of server to the cluster when a crashed
backend server gets back online;
● When auto_rejoin is enabled, the monitor will attempt to direct
standalone servers and servers replicating from a relay master to the main
cluster master server;
● Test it as we did:
○ Check what is the current master, shutdown MariaDB Server;
○ The failover will happen in case auto_failover is enabled;
○ Start the process for the shutdown MariaDB Server;
○ List servers again out of Maxadmin, watch logs.
Maxscale 2.2 New Features
● Rejoin: joining a standalone server to the cluster.
#: current_master=box02
[root@team01-box02 ~]# mysqladmin shutdown
#: watching logs, the failover will happen as the master "crashed"
2018-02-14 18:44:36 error : Monitor was unable to connect to server [10.132.116.147]:3306 : "Can't connect to MySQL server on '10.132.116.147' (115)"
2018-02-14 18:44:36 notice : [mariadbmon] Server [10.132.116.147]:3306 lost the master status.
2018-02-14 18:44:36 notice : Server changed state: box02[10.132.116.147:3306]: master_down. [Master, Running] -> [Down]
2018-02-14 18:44:36 warning: [mariadbmon] Master has failed. If master status does not change in 4 monitor passes, failover begins.
2018-02-14 18:44:40 notice : [mariadbmon] Performing automatic failover to replace failed master 'box02'.
2018-02-14 18:44:40 notice : [mariadbmon] Promoting server 'box03' to master.
2018-02-14 18:44:40 notice : [mariadbmon] Redirecting slaves to new master.
2018-02-14 18:44:41 warning: [mariadbmon] Setting standalone master, server 'box03' is now the master.
2018-02-14 18:44:41 notice : Server changed state: box03[10.132.116.161:3306]: new_master. [Slave, Running] -> [Master, Running]
#: starting old master back
[root@team01-box02 ~]# systemctl start mariadb.service
#: watching logs
2018-02-14 18:47:27 notice : Server changed state: box02[10.132.116.147:3306]: server_up. [Down] -> [Running]
2018-02-14 18:47:27 notice : [mariadbmon] Directing standalone server 'box02' to replicate from 'box03'.
2018-02-14 18:47:27 notice : [mariadbmon] 1 server(s) redirected or rejoined the cluster.
2018-02-14 18:47:28 notice : Server changed state: box02[10.132.116.147:3306]: new_slave. [Running] -> [Slave, Running]
Thank you!
Time for questions
And answers

Contenu connexe

Tendances

Tendances (20)

Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
How to set up orchestrator to manage thousands of MySQL servers
How to set up orchestrator to manage thousands of MySQL serversHow to set up orchestrator to manage thousands of MySQL servers
How to set up orchestrator to manage thousands of MySQL servers
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
Optimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performanceOptimizing MariaDB for maximum performance
Optimizing MariaDB for maximum performance
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
 
Maxscale 소개 1.1.1
Maxscale 소개 1.1.1Maxscale 소개 1.1.1
Maxscale 소개 1.1.1
 
Query logging with proxysql
Query logging with proxysqlQuery logging with proxysql
Query logging with proxysql
 
Maria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityMaria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High Availability
 
MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바
 
How to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScaleHow to Manage Scale-Out Environments with MariaDB MaxScale
How to Manage Scale-Out Environments with MariaDB MaxScale
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
 
MariaDB Galera Cluster presentation
MariaDB Galera Cluster presentationMariaDB Galera Cluster presentation
MariaDB Galera Cluster presentation
 
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
 
MMUG18 - MySQL Failover and Orchestrator
MMUG18 - MySQL Failover and OrchestratorMMUG18 - MySQL Failover and Orchestrator
MMUG18 - MySQL Failover and Orchestrator
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
MySQL GTID 시작하기
MySQL GTID 시작하기MySQL GTID 시작하기
MySQL GTID 시작하기
 
Mysql-MHA
Mysql-MHAMysql-MHA
Mysql-MHA
 
MariaDB Galera Cluster - Simple, Transparent, Highly Available
MariaDB Galera Cluster - Simple, Transparent, Highly AvailableMariaDB Galera Cluster - Simple, Transparent, Highly Available
MariaDB Galera Cluster - Simple, Transparent, Highly Available
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
Getting the most out of MariaDB MaxScale
Getting the most out of MariaDB MaxScaleGetting the most out of MariaDB MaxScale
Getting the most out of MariaDB MaxScale
 

Similaire à Maxscale switchover, failover, and auto rejoin

A3 sec -_msr_2.0
A3 sec -_msr_2.0A3 sec -_msr_2.0
A3 sec -_msr_2.0
a3sec
 

Similaire à Maxscale switchover, failover, and auto rejoin (9)

Shift Remote: JS - Node.js Scalability Tips - Luciano Mammino (FabFitFun)
Shift Remote: JS - Node.js Scalability Tips - Luciano Mammino (FabFitFun)Shift Remote: JS - Node.js Scalability Tips - Luciano Mammino (FabFitFun)
Shift Remote: JS - Node.js Scalability Tips - Luciano Mammino (FabFitFun)
 
Node.js: scalability tips
Node.js: scalability tipsNode.js: scalability tips
Node.js: scalability tips
 
Linux: LVM
Linux: LVMLinux: LVM
Linux: LVM
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
 
Advanced percona xtra db cluster in a nutshell... la suite plsc2016
Advanced percona xtra db cluster in a nutshell... la suite plsc2016Advanced percona xtra db cluster in a nutshell... la suite plsc2016
Advanced percona xtra db cluster in a nutshell... la suite plsc2016
 
A3 sec -_msr_2.0
A3 sec -_msr_2.0A3 sec -_msr_2.0
A3 sec -_msr_2.0
 
A little systemtap
A little systemtapA little systemtap
A little systemtap
 
A little systemtap
A little systemtapA little systemtap
A little systemtap
 
Security testing with gauntlt
Security testing with gauntltSecurity testing with gauntlt
Security testing with gauntlt
 

Plus de Wagner Bianchi

MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016
Wagner Bianchi
 
InnoDB Plugin - II Fórum da Comunidade MySQL
InnoDB Plugin - II Fórum da Comunidade MySQLInnoDB Plugin - II Fórum da Comunidade MySQL
InnoDB Plugin - II Fórum da Comunidade MySQL
Wagner Bianchi
 

Plus de Wagner Bianchi (20)

Migrations from PLSQL and Transact-SQL - m18
Migrations from PLSQL and Transact-SQL - m18Migrations from PLSQL and Transact-SQL - m18
Migrations from PLSQL and Transact-SQL - m18
 
Meetup São Paulo, Maxscale Implementação e Casos de Uso
Meetup São Paulo, Maxscale Implementação e Casos de UsoMeetup São Paulo, Maxscale Implementação e Casos de Uso
Meetup São Paulo, Maxscale Implementação e Casos de Uso
 
Escalando o ambiente com MariaDB Cluster (Portuguese Edition)
Escalando o ambiente com MariaDB Cluster (Portuguese Edition)Escalando o ambiente com MariaDB Cluster (Portuguese Edition)
Escalando o ambiente com MariaDB Cluster (Portuguese Edition)
 
NY Meetup: Scaling MariaDB with Maxscale
NY Meetup: Scaling MariaDB with MaxscaleNY Meetup: Scaling MariaDB with Maxscale
NY Meetup: Scaling MariaDB with Maxscale
 
Webinar: MariaDB Provides the Solution to Ease Multi-Source Replication
Webinar: MariaDB Provides the Solution to Ease Multi-Source ReplicationWebinar: MariaDB Provides the Solution to Ease Multi-Source Replication
Webinar: MariaDB Provides the Solution to Ease Multi-Source Replication
 
MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016MySQL Multi-Source Replication for PL2016
MySQL Multi-Source Replication for PL2016
 
MySQL 5.7 Multi-Source Replication
MySQL 5.7 Multi-Source ReplicationMySQL 5.7 Multi-Source Replication
MySQL 5.7 Multi-Source Replication
 
UNIFAL - MySQL 5.6 - Replicação
UNIFAL - MySQL 5.6 - ReplicaçãoUNIFAL - MySQL 5.6 - Replicação
UNIFAL - MySQL 5.6 - Replicação
 
UNIFAL - MySQL Logs - 5.0/5.6
UNIFAL - MySQL Logs - 5.0/5.6UNIFAL - MySQL Logs - 5.0/5.6
UNIFAL - MySQL Logs - 5.0/5.6
 
UNIFAL - MySQL Transações - 5.0/5.6
UNIFAL - MySQL Transações - 5.0/5.6UNIFAL - MySQL Transações - 5.0/5.6
UNIFAL - MySQL Transações - 5.0/5.6
 
UNIFAL - MySQL Storage Engine - 5.0/5.6
UNIFAL - MySQL Storage Engine - 5.0/5.6UNIFAL - MySQL Storage Engine - 5.0/5.6
UNIFAL - MySQL Storage Engine - 5.0/5.6
 
UNIFAL - MySQL Views - 5.0/5.6
UNIFAL - MySQL Views - 5.0/5.6UNIFAL - MySQL Views - 5.0/5.6
UNIFAL - MySQL Views - 5.0/5.6
 
UNIFAL - MySQL Triggers - 5.0/5.6
UNIFAL - MySQL Triggers - 5.0/5.6UNIFAL - MySQL Triggers - 5.0/5.6
UNIFAL - MySQL Triggers - 5.0/5.6
 
UNIFAL - MySQL Stored Routines - 5.0/5.6
UNIFAL - MySQL Stored Routines - 5.0/5.6UNIFAL - MySQL Stored Routines - 5.0/5.6
UNIFAL - MySQL Stored Routines - 5.0/5.6
 
UNIFAL - MySQL Linguagem SQL Básico - 5.0/5.6
UNIFAL - MySQL Linguagem SQL Básico - 5.0/5.6UNIFAL - MySQL Linguagem SQL Básico - 5.0/5.6
UNIFAL - MySQL Linguagem SQL Básico - 5.0/5.6
 
UNIFAL - MySQL & Vagrant (iniciando os trabalhos)
UNIFAL - MySQL & Vagrant (iniciando os trabalhos)UNIFAL - MySQL & Vagrant (iniciando os trabalhos)
UNIFAL - MySQL & Vagrant (iniciando os trabalhos)
 
Wagner Bianchi, GUOB 2014 MySQL Cluster 7.3
Wagner Bianchi, GUOB 2014 MySQL Cluster 7.3Wagner Bianchi, GUOB 2014 MySQL Cluster 7.3
Wagner Bianchi, GUOB 2014 MySQL Cluster 7.3
 
Introdução ao MySQL 5.6
Introdução ao MySQL 5.6Introdução ao MySQL 5.6
Introdução ao MySQL 5.6
 
Mysql for IBMers
Mysql for IBMersMysql for IBMers
Mysql for IBMers
 
InnoDB Plugin - II Fórum da Comunidade MySQL
InnoDB Plugin - II Fórum da Comunidade MySQLInnoDB Plugin - II Fórum da Comunidade MySQL
InnoDB Plugin - II Fórum da Comunidade MySQL
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Maxscale switchover, failover, and auto rejoin

  • 1. MariaDB Maxscale Switchover, Failover and Rejoin Wagner Bianchi Remote DBA Team Lead @ MariaDB RDBA Team Esa Korhonen Software Engineer @ MariaDB Maxscale Engineering Team
  • 2. Introduction to MariaDB MaxScale ● Intelligent database proxy: ○ Separates client application from backend(s) ○ Understands authentication, queries and backend roles ○ Typical use-cases: read-write splitting, load-balancing ○ Many plugins: query filtering, logging, caching ● Latest GA version: 2.2 DATABASE SERVERS CLIENT
  • 3. Query processing stages Filter Client Protocol Protocol Filter Filter Router Server State Monitor Parser updates monitors uses Backend
  • 4. What is new in MariaDB-Monitor for MaxScale 2.2* ● Support for replication cluster manipulation: failover, switchover, rejoin ○ failover: replace a failed master with a slave ○ switchover: swap a slave with a live master ○ rejoin: bring a standalone server back to the cluster or redirect slaves replicating from the wrong master ● Failover & rejoin can be set to activate automatically ● Reduces need for custom scripts or replication management tools ● Supported topologies: 1 Master, N slaves, 1-level depth ● Limited support for external masters * Note: Renamed from previous mysqlmon
  • 5. Switchover ● Controlled swap of master with a designated slave ● Monitor user must have SUPER-privilege ● Depends on read_only to freeze cluster ○ SUPER-users bypasses this ● Waits for all slaves to catch up with master ○ no data should be lost, but can be slow ● Configuration settings: ○ replication_user & replication_password ○ switchover_timeout $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘ $./maxctrl call command mariadbmon switchover MariaDB-Monitor LocalSlave1 OK $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Slave, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘
  • 6. Failover ● Promote a slave to take place of failed master ● Damage has already been done, so no need to worry about old master ● Chooses a new master based on following criteria (in order of importance): ○ not in exclusion-list ○ has latest event in relay log ○ has processed latest event ○ has log_slave_updates on ● Configuration: ○ failover_timeout ● May lose data with failed master ○ (semi)sync replication $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │ ├──────────────┼───────────┼──────┼─────────────┼────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │ ├──────────────┼───────────┼──────┼─────────────┼────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴────────────────┘ $./maxctrl call command mariadbmon failover MariaDB-Monitor OK $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘
  • 7. Automatic failover ● Trigger: master must be down for a set amount of time ● Additional check by looking at slave connections ● Configuration settings: ○ auto_failover ○ failcount & monitor_interval ○ verify_master_failure & master_failure_timeout $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘ $docker stop maxscalebackends_testing1_master1_1 $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │ ├──────────────┼───────────┼──────┼─────────────┼────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Slave, Running │ ├──────────────┼───────────┼──────┼─────────────┼────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴────────────────┘ $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘
  • 8. Rejoin ● Directs the joining to server to replicate from the cluster master ○ redirect a slave replicating from the wrong master ○ start replication on a standalone server ● Looks at gtid:s to decide if the joining server can replicate ● Manual/automatic mode (auto_rejoin=1) ● Typical use case: master goes down -> failover -> old master comes back -> rejoined to cluster $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Down │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘ $docker start maxscalebackends_testing1_master1_1 $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘ $./maxctrl call command mariadbmon rejoin MariaDB-Monitor LocalMaster1 $./maxctrl list servers ┌──────────────┬───────────┬──────┬─────────────┬─────────────────┐ │ Server │ Address │ Port │ Connections │ State │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalMaster1 │ 127.0.0.1 │ 3001 │ 0 │ Slave, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave1 │ 127.0.0.1 │ 3002 │ 0 │ Master, Running │ ├──────────────┼───────────┼──────┼─────────────┼─────────────────┤ │ LocalSlave2 │ 127.0.0.1 │ 3003 │ 0 │ Slave, Running │ └──────────────┴───────────┴──────┴─────────────┴─────────────────┘
  • 9. External master handling DC A DC B replicating from DC A DC B replicating from
  • 10. Switchover details Starting checks: 1. Cluster has 1 master and >1 slaves 2. All servers use GTID replication and cluster GTID-domain is known 3. Requested new master has binary log on Prepare current master: 1. SET GLOBAL read_only=1; 2. FLUSH TABLES; 3. FLUSH LOGS; 4. update GTID-info Wait until all slaves catch up to master: 1. MASTER_GTID_WAIT() A B C A B C Stop slave replication on new master: 1. STOP SLAVE; 2. RESET SLAVE ALL; 3. SET GLOBAL read_only=0 B A C Redirect slaves & old master to new master: 1. STOP SLAVE; 2. RESET SLAVE; 3. CHANGE MASTER TO … 4. START SLAVE; Check that replication is working: 1. FLUSH TABLES; 2. Check that all slaves receive new gtid
  • 12. Maxscale 2.2 New Features ● At this point you know that, MariaDB Maxscale is able to: ○ Automatic/Manual Failover; ○ Manual Switchover; ○ Rejoin a crashed node as slave of an existing cluster; ● The previous processes relies on the new MariaDBMon monitor; ● Hidden details when implementing and/or break/fix: ○ For the switchover/failover/rejoin work, you need to have the monitor user (MariaDBMon) with access on all the servers or, a separate user for replication_user and replication_password with access on all the servers; ○ If the monitor user (MariaDBMon) has an encrypted password, the replication_password should be encrypted as well, otherwise, the CHANGE MASTER TO running for the processes won't be able to configure the replication for the new server;
  • 13. Maxscale 2.2 New Features ● Failover: replacing a failed master. ● For the automatic failover, auto_failover variable should be true on monitor configuration definition; ○ auto_failover=true, for automatic failover be activated; ● For the manual failover, auto_failover should be set to false on monitor configuration definition; ● The master should be dead for the manual failover to work; ○ auto_failover=false, the failover can be activated manually: ● Enable and disable to auto_failover with the alter monitor command. [root@box01 ~]# maxadmin call command mariadbmon failover replication-cluster-monitor
  • 14. Maxscale 2.2 New Features ● Failover: replacing a failed master (automatic, auto_failover=true) #: checking current configurations [root@box01 ~]# grep auto_failover /var/lib/maxscale/maxscale.cnf.d/replication-cluster-monitor.cnf auto_failover=true #: shutdown the current master - check the current topology out of `maxadmin list servers` for better confirming it [root@box02 ~]# systemctl stop mariadb.service #: watching the actions on the log file 2018-02-10 13:51:02 error : Monitor was unable to connect to server [192.168.50.13]:3306 : "Can't connect to MySQL server on '192.168.50.13'" 2018-02-10 13:51:02 notice : [mariadbmon] Server [192.168.50.13]:3306 lost the master status. 2018-02-10 13:51:02 notice : Server changed state: box03[192.168.50.13:3306]: master_down. [Master, Running] -> [Down] 2018-02-10 13:51:02 warning: [mariadbmon] Master has failed. If master status does not change in 4 monitor passes, failover begins. 2018-02-10 13:51:06 notice : [mariadbmon] Performing automatic failover to replace failed master 'box03'. 2018-02-10 13:51:06 notice : [mariadbmon] Promoting server 'box02' to master. 2018-02-10 13:51:06 notice : [mariadbmon] Redirecting slaves to new master. 2018-02-10 13:51:07 warning: [mariadbmon] Setting standalone master, server 'box02' is now the master. 2018-02-10 13:51:07 notice : Server changed state: box02[192.168.50.12:3306]: new_master. [Slave, Running] -> [Master, Running]
  • 15. Maxscale 2.2 New Features ● Failover: replacing a failed master (manual, auto_failover=false) #: setting auto_fauilover=false [root@box01 ~]# maxadmin alter monitor replication-cluster-monitor auto_failover=false #: current master is down, automatic failover deactivated 2018-02-09 23:31:01 error : Monitor was unable to connect to server [192.168.50.12]:3306:"Can't connect to MySQL server on '192.168.50.12'" 2018-02-09 23:31:01 notice : [mariadbmon] Server [192.168.50.12]:3306 lost the master status. 2018-02-09 23:31:01 notice : Server changed state: box02[192.168.50.12:3306]: master_down. [Master, Running] -> [Down] #: manual failover executed [root@box01 ~]# maxadmin call command mariadbmon failover replication-cluster-monitor #: let's check the logs 2018-02-09 23:32:30 info : (17) [cli] MaxAdmin: call command "mariadbmon" "failover" "replication-cluster-monitor" 2018-02-09 23:32:30 notice : (17) [mariadbmon] Stopped monitor replication-cluster-monitor for the duration of failover. 2018-02-09 23:32:30 notice : (17) [mariadbmon] Promoting server 'box03' to master. 2018-02-09 23:32:30 notice : (17) [mariadbmon] Redirecting slaves to new master. 2018-02-09 23:32:30 notice : (17) [mariadbmon] Failover performed. 2018-02-09 23:32:30 warning: [mariadbmon] Setting standalone master, server 'box03' is now the master. 2018-02-09 23:32:30 notice : Server changed state: box03[192.168.50.13:3306]: new_master. [Slave, Running] -> [Master, Running]
  • 16. Maxscale 2.2 New Features ● Failover: replacing a failed master, additional details ● The passes time is based on the monitor's monitor_interval value; ○ As it's now set as 1000ms, 1 second, the failover will be triggered after 4 seconds, considering the first pass done when monitor reported the first message; ○ If the failover process does not complete within the time configured on failover_timeout, it is 90 secs by default, the failover is canceled and the feature is disabled; ○ To enable failover again (after checking the possible problems), use the alter monitor cmd: 2018-02-10 13:51:02 warning: [mariadbmon] Master has failed.If master status does not change in 4 monitor passes, failover begins. [root@box01 ~]# maxadmin alter monitor replication-cluster-monitor auto_failover=true
  • 17. Maxscale 2.2 New Features ● Switchover: swapping a slave with a running master. ● The switchover process relies on the replication_user and replication_password setting added to the monitor configs; ● The process is triggered manually and it should take up to switchover_timeout seconds to complete - default 90 seconds; ● If the process fails, the log will be written and the auto_failover will be disabled if enabled; [root@team01-box01 ~]# maxadmin call command mariadbmon switchover replication-cluster-monitor new_master master
  • 18. Maxscale 2.2 New Features #: checking the current server's list [root@team01-box01 ~]# maxadmin list servers Servers. -------------------+-----------------+-------+-------------+-------------------- Server | Address | Port | Connections | Status -------------------+-----------------+-------+-------------+-------------------- box02 | 10.132.116.147 | 3306 | 0 | Slave, Running box03 | 10.132.116.161 | 3306 | 0 | Master, Running -------------------+-----------------+-------+-------------+-------------------- #: new_master=box03, current_master=box02 [root@team01-box01 ~]# maxadmin call command mariadbmon switchover replication-cluster-monitor box03 box02 #: checking logs 2018-02-14 16:44:46 info : (712) [cli] MaxAdmin: call command "mariadbmon" "switchover" "replication-cluster-monitor" "box02" "box03" 2018-02-14 16:44:46 notice : (712) [mariadbmon] Stopped the monitor replication-cluster-monitor for the duration of switchover. 2018-02-14 16:44:46 notice : (712) [mariadbmon] Demoting server 'box03'. 2018-02-14 16:44:46 notice : (712) [mariadbmon] Promoting server 'box02' to master. 2018-02-14 16:44:46 notice : (712) [mariadbmon] Old master 'box03' starting replication from 'box02'. 2018-02-14 16:44:46 notice : (712) [mariadbmon] Redirecting slaves to new master. 2018-02-14 16:44:47 notice : (712) [mariadbmon] Switchover box03 -> box02 performed. 2018-02-14 16:44:47 notice : Server changed state: box02[10.132.116.147:3306]: new_master. [Slave, Running] -> [Master, Slave, Running] 2018-02-14 16:44:47 notice : Server changed state: box03[10.132.116.161:3306]: new_slave. [Master, Running] -> [Slave, Running] 2018-02-14 16:44:48 notice : Server changed state: box02[10.132.116.147:3306]: new_master. [Master, Slave, Running] -> [Master, Running] Switchover: swapping a slave with a running master.
  • 19. Maxscale 2.2 New Features ● Rejoin: joining a standalone server to the cluster. ● Enable automatic joining back of server to the cluster when a crashed backend server gets back online; ● When auto_rejoin is enabled, the monitor will attempt to direct standalone servers and servers replicating from a relay master to the main cluster master server; ● Test it as we did: ○ Check what is the current master, shutdown MariaDB Server; ○ The failover will happen in case auto_failover is enabled; ○ Start the process for the shutdown MariaDB Server; ○ List servers again out of Maxadmin, watch logs.
  • 20. Maxscale 2.2 New Features ● Rejoin: joining a standalone server to the cluster. #: current_master=box02 [root@team01-box02 ~]# mysqladmin shutdown #: watching logs, the failover will happen as the master "crashed" 2018-02-14 18:44:36 error : Monitor was unable to connect to server [10.132.116.147]:3306 : "Can't connect to MySQL server on '10.132.116.147' (115)" 2018-02-14 18:44:36 notice : [mariadbmon] Server [10.132.116.147]:3306 lost the master status. 2018-02-14 18:44:36 notice : Server changed state: box02[10.132.116.147:3306]: master_down. [Master, Running] -> [Down] 2018-02-14 18:44:36 warning: [mariadbmon] Master has failed. If master status does not change in 4 monitor passes, failover begins. 2018-02-14 18:44:40 notice : [mariadbmon] Performing automatic failover to replace failed master 'box02'. 2018-02-14 18:44:40 notice : [mariadbmon] Promoting server 'box03' to master. 2018-02-14 18:44:40 notice : [mariadbmon] Redirecting slaves to new master. 2018-02-14 18:44:41 warning: [mariadbmon] Setting standalone master, server 'box03' is now the master. 2018-02-14 18:44:41 notice : Server changed state: box03[10.132.116.161:3306]: new_master. [Slave, Running] -> [Master, Running] #: starting old master back [root@team01-box02 ~]# systemctl start mariadb.service #: watching logs 2018-02-14 18:47:27 notice : Server changed state: box02[10.132.116.147:3306]: server_up. [Down] -> [Running] 2018-02-14 18:47:27 notice : [mariadbmon] Directing standalone server 'box02' to replicate from 'box03'. 2018-02-14 18:47:27 notice : [mariadbmon] 1 server(s) redirected or rejoined the cluster. 2018-02-14 18:47:28 notice : Server changed state: box02[10.132.116.147:3306]: new_slave. [Running] -> [Slave, Running]
  • 21. Thank you! Time for questions And answers