Mysql 集群技术

MySQL 集群(MySQL Cluster)本质是为了解决单节点 MySQL 的 性能瓶颈 (高并发)、可用性风险 (单点故障)和 数据可靠性(数据丢失)问题,通过多台服务器协同工作,将数据分散 / 复制存储、请求分散处理,最终实现:

  • 高可用(HA):单个节点故障不影响整体服务;
  • 高扩展(Scalability):可通过增加节点提升处理能力;
  • 数据一致性:集群内数据保持同步(不同架构一致性级别不同)

MYSQL 在服务器中的部署方法

在企业中 90%的服务器操作系统均为 Linux,Mysql 版本使用最多的是 Mysql5.7 和 Mysql8,在企业中对于 Mysql 的安装通常用源码编译的方式来进行(官网:http://www.mysql.com

1 在 Linux 下部署 mysql

RHEL9中的编译环境设置

#在RHEL9中软件仓库中自带安装mysql8的依赖软件,直接安装即可

root@mysql_node1 \~\]# dnf install cmake3 gcc git bison openssl-devel ncurses-devel systemd-devel rpcgen.x86_64 libtirpc-devel-1.3.3-9.el9.x86_64.rpm gcc-toolset-12-gcc gcc-toolset-12-gcc-c++ gcc-toolset-12-binutils gcc-toolset-12-annobin-annocheck gcc-toolset-12-annobin-plugin-gcc -y \[root@mysql_node1 \~\]# cmake3 --version cmake version 3.26.5 \[root@mysql_node1 \~\]# gcc -v gcc version 11.5.0 20240719 (Red Hat 11.5.0-5) (GCC) **下载并解压源码包** wget https://downloads.mysql.com/archives/get/p/23/file/mysql-boost-8.3.0.tar.gz \[root@mysql_node1 mnt\]# tar zxf mysql-boost-8.3.0.tar.gz \[root@mysql_node1 mnt\]# cd mysql-8.3.0/ **源码编译安装mysql** #源码编译参数详解 \[root@mysql_node1 mysql-8.3.0\]# mkdir build #建立编译目录 \[root@mysql_node1 mysql-8.3.0\]# cmake3 .. \\ -DCMAKE_INSTALL_PREFIX=/usr/local/mysql \\ #指定安装路径 -DMYSQL_DATADIR=/data/mysql \\ #指定数据目录 -DSYSTEMD=ON \\ # 启用 systemd 支持(核心参数) -DSYSTEMD_SERVICE_DIR=/usr/lib/systemd/system \\ # systemd 服务文件安装路径 -DMYSQL_UNIX_ADDR=/data/mysql/mysql.sock \\ #指定套接字文件 -DWITH_INNOBASE_STORAGE_ENGINE=1 \\ #指定启用INNODB存储引擎,默认用myisam -DWITH_EXTRA_CHARSETS=all \\ #扩展字符集 -DDEFAULT_CHARSET=utf8mb4 \\ #指定默认字符集 -DDEFAULT_COLLATION=utf8mb4_unicode_ci \\ #指定默认校验字符集 -DWITH_SSL=system \\ #指定MySQL 使用系统已安装的 SSL 库 -DWITH_BOOST=bundled \\ #指定使用 MySQL 源码包中内置的Boost库 -DWITH_DEBUG=OFF #源码编译命令 \[root@mysql_node1 build\]# cmake3 .. -DCMAKE_INSTALL_PREFIX=/usr/local/mysql -DMYSQL_DATADIR=/data/mysql -DMYSQL_UNIX_ADDR=/data/mysql/mysql.sock -DWITH_INNOBASE_STORAGE_ENGINE=1 -DWITH_EXTRA_CHARSETS=all -DDEFAULT_CHARSET=utf8mb4 -DDEFAULT_COLLATION=utf8mb4_unicode_ci -DWITH_BOOST=bundled -DWITH_SSL=system -DWITH_DEBUG=OFF -DSYSTEMD=ON -DSYSTEMD_SERVICE_DIR=/usr/lib/systemd/system \[root@mysql_node1 build\]# make **注意:当 cmake 出错后如果想重新检测,build 目录中 CMakeCache.txt 即可** **部署mysql** \[root@mysql_node1 build\]# make install \[root@mysql-node1 build\]# cd /usr/local/mysql/ \[root@mysql-node1 mysql\]# vim \~/.bash_profile # .bash_profile # Get the aliases and functions if \[ -f \~/.bashrc \]; then . \~/.bashrc fi # User specific environment and startup programs export PATH=$PATH:/usr/local/mysql/bin #设置mysql运行环境的环境变量 \[root@mysql-node1 mysql\]# source \~/.bash_profile \[root@mysql-node1 mysql\]# useradd -r -s /sbin/nologin -M mysql \[root@mysql-node1 mysql\]# mkdir -p /data/mysql \[root@mysql-node1 mysql\]# chown mysql.mysql /data/mysql/ \[root@mysql-node1 \~\]# vim /etc/my.cnf \[mysqld

datadir=/data/mysql

socket=/data/mysql/mysql.sock
mysql数据结构初始化

root@mysql-node1 \~\]# mysqld --initialize --user=mysql ![](https://i-blog.csdnimg.cn/direct/7c5249bc83984189b25f2b82462e0631.png) **启动mysql** \[root@mysql-node1 \~\]# dnf install initscripts-10.11.8-4.el9.x86_64 -y \[root@mysql-node1 \~\]# cd /usr/local/mysql/support-files/ \[root@mysql-node1 support-files\]# cp -p mysql.server /etc/init.d/mysqld \[root@mysql-node1 support-files\]# /etc/init.d/mysqld start Starting MySQL.Logging to '/data/mysql/mysql-node1.err'. . SUCCESS! #开机启动 \[root@mysql-node1 support-files\]# chkconfig --level 35 mysqld on **mysql的安全初始化** \[root@mysql-node1 \~\]# mysql_secure_installation Securing the MySQL server deployment. Enter password for user root: The existing password for the user account root has expired. Please set a new password. New password: Re-enter new password: VALIDATE PASSWORD COMPONENT can be used to test passwords and improve security. It checks the strength of password and allows the users to set only those passwords which are secure enough. Would you like to setup VALIDATE PASSWORD component? Press y\|Y for Yes, any other key for No: no Using existing password for root. Change the password for root ? ((Press y\|Y for Yes, any other key for No) : no Remove anonymous users? (Press y\|Y for Yes, any other key for No) : y Success. Normally, root should only be allowed to connect from 'localhost'. This ensures that someone cannot guess at the root password from the network. Disallow root login remotely? (Press y\|Y for Yes, any other key for No) : y Success. By default, MySQL comes with a database named 'test' that anyone can access. This is also intended only for testing, and should be removed before moving into a production environment. Remove test database and access to it? (Press y\|Y for Yes, any other key for No) : y - Dropping test database... Success. - Removing privileges on test database... Success. Reloading the privilege tables will ensure that all changes made so far will take effect immediately. Reload privilege tables now? (Press y\|Y for Yes, any other key for No) : y Success. All done!

Mysql集群实战------主从复制

1 配置主从

1.编写my.cnf 主配置文件

bash 复制代码
[root@mysql-node1 ~]# vim /etc/my.cnf
[mysqld]                       #配置属于 MySQL 服务器进程
datadir=/data/mysql            #设置 MySQL 数据库文件的存储目录
socket=/data/mysql/mysql.sock  #指定 MySQL 的 Unix 套接字文件路径
symbolic-links=0               #出于安全考虑,禁止通过符号链接访问数据库文件

server-id=10                   #设置服务器唯一标识符
log-bin=mysql-bin              #启用二进制日志(binary logging) 


[root@mysql-node2 ~]# vim /etc/my.cnf
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
symbolic-links=0

server-id=20
log-bin=mysql-bin


[root@mysql-node3 ~]# vim /etc/my.cnf
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
symbolic-links=0

server-id=30
log-bin=mysql-bin


#在三台主机中重启数据库
[root@mysql-node1~3 ~]# /etc/init.d/mysqld restart

2.建立同步时需要用到的数据库账号

bash 复制代码
[root@mysql-node1 ~]# mysql -uroot -plee
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 8.3.0 Source distribution

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SHOW VARIABLES LIKE 'default_authentication_plugin';
+-------------------------------+-----------------------+
| Variable_name                 | Value                 |
+-------------------------------+-----------------------+
| default_authentication_plugin | mysql_native_password |
+-------------------------------+-----------------------+
1 row in set (0.06 sec)

mysql> create user lee@'%' identified with mysql_native_password by 'lee';    #建立用户
Query OK, 0 rows affected (0.01 sec)

mysql> select User from mysql.user;
+------------------+
| User             |
+------------------+
| lee              |
| mysql.infoschema |
| mysql.session    |
| mysql.sys        |
| root             |
+------------------+
5 rows in set (0.00 sec)

mysql> GRANT replication slave ON *.* to lee@'%';        #给用户授权
Query OK, 0 rows affected (0.01 sec)

mysql> SHOW GRANTS FOR lee@'%';
+---------------------------------------------+
| Grants for lee@%                            |
+---------------------------------------------+
| GRANT REPLICATION SLAVE ON *.* TO `lee`@`%` |
+---------------------------------------------+
1 row in set (0.00 sec)


#在其他主机中
[root@mysql-node2 ~]# mysql -ulee -plee -h192.168.131.10
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 13
Server version: 8.3.0 Source distribution

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

3.配置数据库一主一从

#在master中查看日志文件名称及id

bash 复制代码
mysql> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000002 |     1254 |              |                  | af08b2f7-1bc5-11f1-bca0-000c29679c4e:1-5 |
+------------------+----------+--------------+------------------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)

#在slave主机中

bash 复制代码
[root@mysql-node2 ~]# mysql -uroot -plee    

mysql> change master to MASTER_HOST='192.168.131.10',MASTER_USER='lee',MASTER_PASSWORD='lee',MASTER_LOG_FILE='mysql-bin.000002',MASTER_LOG_POS=1254;
Query OK, 0 rows affected, 8 warnings (0.05 sec)

mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.02 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 192.168.131.10
                  Master_User: lee
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1254
               Relay_Log_File: mysql-node2-relay-bin.000002
                Relay_Log_Pos: 320
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes        #数据同步成功
            Slave_SQL_Running: Yes        #通过同步过来的数据做日志回访成功

4.向当前一主一从中加入新的数据库

bash 复制代码
[root@mysql-node1 ~]# mysql -uroot -lee			#在master中建立库
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
4 rows in set (0.00 sec)

mysql> create database timinglee;
Query OK, 1 row affected (0.00 sec)

mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| timinglee          |
+--------------------+
5 rows in set (0.00 sec)
bash 复制代码
模拟一主一从中已经存在数据情况
[root@mysql-node1 ~]# mysql -uroot -plee
mysql> CREATE TABLE timinglee.userlist (
    -> name VARCHAR(10) not null,
    -> pass VARCHAR(50) not null);
Query OK, 0 rows affected (0.05 sec)

mysql> select * from timinglee.userlist;
Empty set (0.08 sec)

mysql> INSERT INTO timinglee.userlist values ('user1','123');
Query OK, 1 row affected (0.04 sec)

mysql> select * from timinglee.userlist;
+-------+------+
| name  | pass |
+-------+------+
| user1 | 123  |
+-------+------+
1 row in set (0.00 sec)


加入新从库时需要手动拉平数据
[root@mysql-node1 ~]# mysqldump -uroot -p timinglee > timinglee.sql
[root@mysql-node1 ~]# scp timinglee.sql root@192.168.131.30:/root/
timinglee.sql

[root@mysql-node3 ~]# mysql -uroot -plee
mysql> select * from timinglee.userlist;
+-------+------+
| name  | pass |
+-------+------+
| user1 | 123  |
+-------+------+
1 row in set (0.00 sec)

5.将新库加入主从结构中

bash 复制代码
#在master中查看日志的id
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000002 |     1962 |              |                  | af08b2f7-1bc5-11f1-bca0-000c29679c4e:1-8 |
+------------------+----------+--------------+------------------+------------------------------------------+
1 row in set, 1 warning (0.00 sec)




mysql> change master to MASTER_HOST='192.168.131.10',MASTER_USER='lee',MASTER_PASSWORD='lee',MASTER_LOG_FILE='mysql-bin.000002',MASTER_LOG_POS=1962;
Query OK, 0 rows affected, 8 warnings (0.02 sec)

mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.02 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 192.168.131.10
                  Master_User: lee
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1962
               Relay_Log_File: mysql-node3-relay-bin.000002
                Relay_Log_Pos: 1028
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

6.测试一主两从

#在master中建立数据

mysql> INSERT INTO timinglee.userlist values ('user2','123');

#在新加入的slave中查看信息

2 mysql主从架构中的使用技巧及优化

1.延迟复制

bash 复制代码
#在指定需要延迟同步的slave主机中,如果主机中安装数据库的版本是8以上
mysql>  STOP REPLICA;
mysql>  CHANGE REPLICATION SOURCE TO SOURCE_DELAY=60;
mysql> START REPLICA;

[root@mysql-node2 ~]# mysql -uroot -plee -e "show slave status\G;" | grep SQL_Delay
mysql: [Warning] Using a password on the command line interface can be insecure.
                    SQL_Delay: 60

#在master主机中对数据进行更改
mysql> delete from timinglee.userlist where name='user1';
mysql> select * from timinglee.userlist;
+-------+------+
| name  | pass |
+-------+------+
| user2 | 123  |
+-------+------+
1 row in set (0.00 sec)


#在未被延迟的slave数据库中查看是否数据操作动作被同步
mysql> select  * from timinglee.userlist;
+-------+------+
| name  | pass |
+-------+------+
| user2 | 123  |
+-------+------+
1 row in set (0.00 sec)


#在被设定延迟复制的主机中查看动作是否被同步
mysql> select * from timinglee.userlist;
+-------+------+
| name  | pass |
+-------+------+
| user1 | 123  |
| user2 | 123  |
+-------+------+
2 rows in set (0.00 sec)

#等待延迟时间过后再次查看
mysql> select * from timinglee.userlist;
+-------+------+
| name  | pass |
+-------+------+
| user2 | 123  |
+-------+------+
1 row in set (0.00 sec)


2.慢查询日志

bash 复制代码
慢查询日志是否开启
mysql> show variables like "slow%";
+---------------------+----------------------------------+
| Variable_name       | Value                            |
+---------------------+----------------------------------+
| slow_launch_time    | 2                                |
| slow_query_log      | OFF                              |
| slow_query_log_file | /data/mysql/mysql-node1-slow.log |
+---------------------+----------------------------------+
3 rows in set (0.00 sec)

开启慢查询日志
mysql> set global slow_query_log=ON;
Query OK, 0 rows affected (0.01 sec)

mysql> show variables like "slow%";
+---------------------+----------------------------------+
| Variable_name       | Value                            |
+---------------------+----------------------------------+
| slow_launch_time    | 2                                |
| slow_query_log      | ON                               |
| slow_query_log_file | /data/mysql/mysql-node1-slow.log |
+---------------------+----------------------------------+
3 rows in set (0.00 sec)

检测慢查询日志
mysql> SET long_query_time=4;            #设置慢查询阈值
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW VARIABLES like "long%";
+-----------------+----------+
| Variable_name   | Value    |
+-----------------+----------+
| long_query_time | 4.000000 |
+-----------------+----------+
1 row in set (0.00 sec)


mysql> select sleep(4);
+----------+
| sleep(4) |
+----------+
|        0 |
+----------+
1 row in set (4.00 sec)

[root@mysql-node1 ~]# cat /data/mysql/mysql-node1-slow.log
/usr/local/mysql/bin/mysqld, Version: 8.3.0 (Source distribution). started with:
Tcp port: 3306  Unix socket: /data/mysql/mysql.sock
Time                 Id Command    Argument
# Time: 2026-03-09T16:13:47.766648Z
# User@Host: root[root] @ localhost []  Id:    18
# Query_time: 4.001562  Lock_time: 0.000000 Rows_sent: 1  Rows_examined: 1
SET timestamp=1773072823;
select sleep(4);

3.多线程回放

#在slave主机中默认回方日志时使用单线程回放

#开启多线程回放日志(在slave主中)

bash 复制代码
[root@mysql-node2 ~]# vim /etc/my.cnf
slave-parallel-type=LOGICAL_CLOCK
slave-parallel-workers=16
relay_log_recovery=ON

[root@mysql-node2 ~]# /etc/init.d/mysqld restart

#查看更改生效信息

4.原理剖析

三个线程

实际上主从同步的原理就是基于 binlog 进行数据同步的。在主从复制过程中,会基于 3 个线程来操作,一个主库线程,两个从库线程。

  • 二进制日志转储线程(Binlog dump thread)是一个主库线程。当从库线程连接的时候, 主库可以将二进制日志发送给从库,当主库读取事件(Event)的时候,会在 Binlog 上加锁,读取完成之后,再将锁释放掉。

  • 从库 I/O 线程会连接到主库,向主库发送请求更新 Binlog。这时从库的 I/O 线程就可以读取到主库的二进制日志转储线程发送的 Binlog 更新部分,并且拷贝到本地的中继日志 (Relay log)。

  • 从库 SQL 线程会读取从库中的中继日志,并且执行日志中的事件,将从库中的数据与主库保持同步。

复制三步骤

步骤 1:Master 将写操作记录到二进制日志(binlog)。

步骤 2:Slave 将 Master 的 binary log events 拷贝到它的中继日志(relay log);

步骤 3:Slave 重做中继日志中的事件,将改变应用到自己的数据库中。 MySQL 复制是异步的且串行化的,而且重启后从接入点开始复制。

具体操作

1.slaves 端中设置了 master 端的 ip,用户,日志,和日志的 Position,通过这些信息取得 master 的认证及信息

2.master 端在设定好 binlog 启动后会开启 binlog dump 的线程

3.master 端的 binlog dump 把二进制的更新发送到 slave 端的

4.slave 端开启两个线程,一个是 I/O 线程,一个是 sql 线程,

  • i/o 线程用于接收 master 端的二进制日志,此线程会在本地打开 relaylog 中继日志,并且保存到本地磁盘

  • sql 线程读取本地 relog 中继日志进行回放

5.什么时候我们需要多个 slave?

当读取的而操作远远高与写操作时。我们采用一主多从架构

数据库外层接入负载均衡层并搭配高可用机制

架构缺陷

  • 主从架构采用的是异步机制
  • master 更新完成后直接发送二进制日志到 slave,但是 slaves 是否真正保存了数据 master 端不会检测
  • master 端直接保存二进制日志到磁盘
  • 当 master 端到 slave 端的网络出现问题时或者 master 端直接挂掉,二进制日志可能根本没有到达 slave
  • master 出现问题 slave 端接管 master,这个过程中数据就丢失了
  • 这样的问题出现就无法达到数据的强一致性,零数据丢失

半同步模式

1 半同步模式原理

1.用户线程写入完成后 master 中的 dump 会把日志推送到 slave 端

2.slave 中的 io 线程接收后保存到 relaylog 中继日志

3.保存完成后 slave 向 master 端返回 ack

4.在未接受到 slave 的 ack 时 master 端时不做提交的,一直处于等待当收到 ack 后提交到存储引擎

5.在 5.6 版本中用到的时 after_commit 模式,after_commit 模式时先提交在等待 ack 返回后输出 ok

2 GTID 模式

GTID 模式是 MySQL 数据库复制中的一种功能,其核心是 全局事务标识符

当为启用 gtid 时我们要考虑的问题

在 master 端的写入时多用户读写,在 slave 端的复制时单线程日志回放,所以 slave 端一定会延迟与 master 端

这种延迟在 slave 端的延迟可能会不一致,当 master 挂掉后 slave 接管,一般会挑选一个和 master 延迟日志最接近的充当新的 master

那么为接管 master 的主机继续充当 slave 角色并会指向到新的 master 上,作为其 slave

这时候按照之前的配置我们需要知道新的 master 上的 pos 的 id,但是我们无法确定新的 master 和 slave 之间差多少

当激活 GITD 之后

当 master 出现问题后,slave2 和 master 的数据最接近,会被作为新的 master

slave1 指向新的 master,但是他不会去检测新的 master 的 pos id,只需要继续读取自己 gtid_next 即可

3 GTID 的简单工作流程

  1. 主库执行一个事务,提交后自动生成一个唯一的 GTID,记录到 binlog 里;

  2. 从库读取主库的 binlog,先记录这个 GTID(标记为 "已收到");

  3. 从库执行这个事务,执行完后把 GTID 标记为 "已执行";

  4. 主从同步时,从库只会向主库请求自己 "未执行" 的 GTID 对应的事务。

设置gtid

在master和slave中默认gtid模式是未开启的

bash 复制代码
mysql> show variables like '%gtid%';
+----------------------------------+-----------+
| Variable_name                    | Value     |
+----------------------------------+-----------+
| binlog_gtid_simple_recovery      | ON        |
| enforce_gtid_consistency         | OFF       |
| gtid_executed                    |           |
| gtid_executed_compression_period | 0         |
| gtid_mode                        | OFF       |
| gtid_next                        | AUTOMATIC |
| gtid_owned                       |           |
| gtid_purged                      |           |
| session_track_gtids              | OFF       |
+----------------------------------+-----------+
9 rows in set (0.01 sec)

在所有主机中加入参数

bash 复制代码
[root@mysql-node1~3 ~]# vim /etc/my.cnf
gtid_mode=ON
enforce-gtid-consistency=ON

在三台主机中分别查看gtid模式是否开启

bash 复制代码
mysql> show variables like '%gtid%';
+----------------------------------+-----------+
| Variable_name                    | Value     |
+----------------------------------+-----------+
| binlog_gtid_simple_recovery      | ON        |
| enforce_gtid_consistency         | ON        |
| gtid_executed                    |           |
| gtid_executed_compression_period | 0         |
| gtid_mode                        | ON        |
| gtid_next                        | AUTOMATIC |
| gtid_owned                       |           |
| gtid_purged                      |           |
| session_track_gtids              | OFF       |
+----------------------------------+-----------+
9 rows in set (0.00 sec)

4 启用半同步模式

在master主机中

bash 复制代码
[root@mysql-node1 ~]# vim /etc/my.cnf
rpl_semi_sync_master_enabled=1


mysql> install PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK, 0 rows affected, 1 warning (0.03 sec)

mysql> SET GLOBAL rpl_semi_sync_master_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

在slave主机中

bash 复制代码
[root@mysql-node2 ~]# vim /etc/my.cnf
rpl_semi_sync_slave_enabled=1


mysql>  INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
mysql> SET GLOBAL rpl_semi_sync_slave_enabled =1;
mysql> STOP SLAVE IO_THREAD;
mysql> START SLAVE IO_THREAD;

mysql> SELECT PLUGIN_NAME, PLUGIN_STATUS FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_NAME LIKE '%semi%';
+---------------------+---------------+
| PLUGIN_NAME         | PLUGIN_STATUS |
+---------------------+---------------+
| rpl_semi_sync_slave | ACTIVE        |
+---------------------+---------------+
1 row in set (0.01 sec)


mysql> show variables like 'rpl_semi_sync%';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| rpl_semi_sync_slave_enabled     | ON    |
| rpl_semi_sync_slave_trace_level | 32    |
+---------------------------------+-------+
2 rows in set (0.00 sec)


mysql> show status like 'rpl_semi_sync%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Rpl_semi_sync_slave_status | ON    |
+----------------------------+-------+
1 row in set (0.01 sec)

测试:

bash 复制代码
#在主库中
mysql> create database timinglee;
Query OK, 1 row affected (00.01 sec)			

mysql> SHOW VARIABLES LIKE 'rpl_semi_sync%';
+-------------------------------------------+------------+
| Variable_name                             | Value      |
+-------------------------------------------+------------+
| rpl_semi_sync_master_enabled              | ON         |
| rpl_semi_sync_master_timeout              | 10000      |
| rpl_semi_sync_master_trace_level          | 32         |
| rpl_semi_sync_master_wait_for_slave_count | 1          |
| rpl_semi_sync_master_wait_no_slave        | ON         |
| rpl_semi_sync_master_wait_point           | AFTER_SYNC |
+-------------------------------------------+------------+
6 rows in set (0.00 sec)

#模拟ack故障 在所有slave主机中
mysql> STOP SLAVE IO_THREAD;
Query OK, 0 rows affected, 1 warning (0.00 sec)

#在主库写入数据
mysql> insert into timinglee.userlist values ('user3','123');
Query OK, 1 row affected (10.01 sec)        #10秒超时(等待ack时间)

mysql> SHOW STATUS LIKE 'Rpl_semi_sync%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 0     |
| Rpl_semi_sync_master_net_avg_wait_time     | 0     |
| Rpl_semi_sync_master_net_wait_time         | 0     |
| Rpl_semi_sync_master_net_waits             | 0     |
| Rpl_semi_sync_master_no_times              | 1     |
| Rpl_semi_sync_master_no_tx                 | 1     |
| Rpl_semi_sync_master_status                | OFF   |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 0     |
+--------------------------------------------+-------+
14 rows in set (0.01 sec)

#恢复故障 在所有slave主机中
mysql> start SLAVE IO_THREAD;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql高可用的组复制(MGR)

  • MySQL Group Replication(简称 MGR )是 MySQL 官方于 2016 年 12 月推出的一个全新的高可用与高扩展的解决方案
  • 组复制是 MySQL 5.7.17 版本出现的新特性,它提供了高可用、高扩展、高可靠的 MySQL 集群服务
  • MySQL 组复制分单主模式和多主模式,传统的 mysql 复制技术仅解决了数据同步的问题,
  • MGR 对属于同一组的服务器自动进行协调。对于要提交的事务,组成员必须就全局事务序列中给定事务的顺序达成一致
  • 提交或回滚事务由每个服务器单独完成,但所有服务器都必须做出相同的决定
  • 如果存在网络分区,导致成员无法达成事先定义的分割策略,则在解决此问题之前系统不会继续进行,这是一种内置的自动裂脑保护机制
  • MGR 由组通信系统(Group Communication System,GCS ) 协议支持
  • 该系统提供故障检测机制、组成员服务以及安全且有序的消息传递

1 组复制流程

首先我们将多个节点共同组成一个复制组,在执行读写(RW)事务的时候,需要通过一致性协议层(Consensus 层)的同意,也就是读写事务想要进行提交,必须要经过组里"大多数人"(对应 Node 节点)的同意,大多数指的是同意的节点数量需要大于 (N/2+1),这样才可以进行提交,而不是原发起方一个说了算。而针对只读(RO)事务则不需要经过组内同意,直接提交即可(节点数量不能超过 9 台 )

2 组复制单主和多主模式

  • single-primary mode(单写或单主模式)

单写模式 group 内只有一台节点可写可读,其他节点只可以读。当主服务器失败时,会自动选择新的主服务器

  • multi-primary mode(多写或多主模式)

组内的所有机器都是 primary 节点,同时可以进行读写操作,并且数据是最终一致的。

3 实现Mysql组复制

1.还原mysql所有节点

方法一:手动还原

bash 复制代码
[root@mysql-node1 ~]# /etc/init.d/mysqld stop
[root@mysql-node1 ~]# rm -rf /data/mysql/*
[root@mysql-node1 ~]# vim /etc/my.cnf
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
server-id=10							#配置server唯一标识号
default_authentication_plugin=mysql_native_password    
disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY" #禁用指定存储引擎
gtid_mode=ON						#启用全局事件标识
enforce_gtid_consistency=ON			#强制gtid一致
binlog_checksum=NONE				#禁止对二进制日志校验
log_slave_updates=ON				#打开数据库中继,
									#当slave中sql线程读取日志后也会写入到自己的binlog中
log_bin=binlog						#重新指定log名称	
binlog_format=ROW					#使用行日志格式	
transaction_write_set_extraction=XXHASH64	#把每个事件编码为加密散列

[root@mysql-node10 ~]# mysqld --user=mysql --initialize  #初始化

方法二:利用ansible还原

bash 复制代码
[root@mha ~]# vim /etc/yum.repos.d/epel.repo
[epel]
name = epel
baseurl = https://mirrors.aliyun.com/epel-archive/9.6/Everything/x86_64/
gpgcheck = 0

[root@mha ~]# ansible --version
ansible [core 2.14.18]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.9/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.9.21 (main, Feb 10 2025, 00:00:00) [GCC 11.5.0 20240719 (Red Hat 11.5.0-5)] (/usr/bin/python3)
  jinja version = 3.1.2
  libyaml = True

[root@mha ~]# useradd  devops
[root@mha ~]# echo lee | passwd --stdin devops
[root@mha ~]# su - devops

[devops@mha ~]$ mkdir  ansible
[devops@mha ansible]$ vim ansible.cfg
[defaults]
inventory =./inventory
remote_user=devops
host_key_checking=false

[privilege_escalation]
become=True
become_ask_pass=False
become_method=sudo
become_user=root

[devops@mha ansible]$ vim inventory
[mysql]
192.168.131.10
192.168.131.20
192.168.131.30

[devops@mha ansible]$ ansible mysql -m user -a 'name=devops'
[devops@mha ansible]$  ansible mysql -m shell -a 'echo devops | passwd --stdin devops'
[devops@mha ansible]$ ansible mysql -m shell -a 'echo "devops   ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers'
192.168.131.20 | CHANGED | rc=0 >>

192.168.131.30 | CHANGED | rc=0 >>

192.168.131.10 | CHANGED | rc=0 >>

[devops@mha ansible]$ ansible all -m file -a 'path=/home/devops/.ssh owner=devops group=devops mode="0700" state=directory'
[devops@mha ansible]$ ansible all -m copy -a 'src=/home/devops/.ssh/authorized_keys dest=/home/devops/.ssh/authorized_keys owner=devops group=devops mode='0600''

[devops@mha ansible]$ cat >ansible.cfg <<EOF
[defaults]
inventory=./inventory
remote_user=devops
host_key_checking=false

[privilege_escalation]
become=True
become_ask_pass=False
become_method=sudo
become_user=root
EOF

[devops@mha ansible]$ ansible all -m shell -a 'whoami'
192.168.131.20 | CHANGED | rc=0 >>
root
192.168.131.30 | CHANGED | rc=0 >>
root
192.168.131.10 | CHANGED | rc=0 >>
root

[devops@mha ansible]$ vim clear_mysql.yml
- name: reset mysql
  hosts: mysql
  tasks:
  - name: stop mysql
    shell: '/etc/init.d/mysqld stop'
    ignore_errors: yes

  - name: delete mysql data
    file:
      path: /data/mysql
      state: absent

  - name: crate data directroy
    file:
      path: /data/mysql
      state: directory
      owner: mysql
      group: mysql

  - name: initialize mysql
    shell: '/usr/local/mysql/bin/mysqld --initialize --user=mysql'


[devops@mha ansible]$ ansible-playbook  clear_mysql.yml  -vv | grep password

2.部署组复制

bash 复制代码
[root@mysql-node1 ~]# vim /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.131.10          mysql-node1
192.168.131.20          mysql-node2
192.168.131.30          mysql-node3


[root@mysql-node1 ~]# vim /etc/my.cnf
[mysqld]
datadir=/data/mysql
socket=/data/mysql/mysql.sock
#symbolic-link=0
server-id=10
log-bin=mysql-bin
gtid_mode=ON
enforce-gtid-consistency=ON

default_authentication_plugin=mysql_native_password
log_slave_updates=ON
binlog_format=ROW
binlog_checksum=NONE
disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"

#初始化完成后加入组复制参数
plugin_load_add='group_replication.so'        #加载组复制插件
group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"    #通知插件正式加入
group_replication_start_on_boot=off            #在server启动时不自动启动组复制
group_replication_local_address="192.168.131.10:33061"    #指定插件接受其他成员的信息端口
group_replication_group_seeds="192.168.131.10:33061,192.168.131.20:33061,192.168.131.30:33061"                                     #本地地址允许访问成员列表
group_replication_bootstrap_group=off     #不随系统自启而启动
group_replication_single_primary_mode=OFF #使用多主模式

#配置组复制-在首台主机中
[root@mysql-node1 ~]# vim /etc/my.cnf
[root@mysql-node1 ~]# /etc/init.d/mysqld start
[root@mysql-node1 ~]# mysql -uroot -p'1Q*yCdU!p%Eh'

mysql> alter user root@localhost identified   by 'lee';
Query OK, 0 rows affected (0.04 sec)

mysql> SET SQL_LOG_BIN=0;		
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE USER rpl_user@'%' IDENTIFIED BY 'lee';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT CONNECTION_ADMIN ON *.* TO rpl_user@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT BACKUP_ADMIN ON *.* TO rpl_user@'%';
Query OK, 0 rows affected (0.00 sec)

mysql>  GRANT GROUP_REPLICATION_STREAM ON *.* TO rpl_user@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

mysql> SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0.00 sec)

mysql> CHANGE REPLICATION SOURCE TO SOURCE_USER='rpl_user', SOURCE_PASSWORD='lee' FOR CHANNEL 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.01 sec)

mysql> SHOW PLUGINS;     #查看组复制插件是否激活
| group_replication               | ACTIVE   | GROUP REPLICATION  | group_replication.so | GPL     |
mysql> SET GLOBAL group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.00 sec) #作用:关键步骤。此命令指示当前节点引导创建一个新的复制组。只在初始化集群的第一个节点启动时执行一次。

mysql> START GROUP_REPLICATION USER='rpl_user', PASSWORD='lee';
Query OK, 0 rows affected (1.10 sec)

mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | ac3d6eaf-1a0a-11f1-9efa-000c29f4a60c | mysql-node1 |        3306 | ONLINE       | PRIMARY     | 8.3.0          | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
1 row in set (0.00 sec)


#配置组复制在其余主机中
[root@mysql-node2 ~]# /etc/init.d/mysqld start
[root@mysql-node2 ~]# mysql -uroot -p'sAk58a2uem,1'
mysql> alter user root@localhost identified   by 'lee';
Query OK, 0 rows affected (0.00 sec)

mysql> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE USER rpl_user@'%' IDENTIFIED BY 'lee';
Query OK, 0 rows affected (0.00 sec)

mysql>  GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT CONNECTION_ADMIN ON *.* TO rpl_user@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT BACKUP_ADMIN ON *.* TO rpl_user@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT GROUP_REPLICATION_STREAM ON *.* TO rpl_user@'%';
Query OK, 0 rows affected (0.00 sec)

mysql>  SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0.00 sec)

mysql>  CHANGE REPLICATION SOURCE TO SOURCE_USER='rpl_user',SOURCE_PASSWORD='lee' FOR CHANNEL 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.00 sec)

mysql> START GROUP_REPLICATION USER='rpl_user', PASSWORD='lee';
ERROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log.			#出现此处报错可以初始化下master

mysql> reset master;			#用过此命令解决以上报错
Query OK, 0 rows affected, 1 warning (0.04 sec)

mysql> START GROUP_REPLICATION USER='rpl_user', PASSWORD='lee';
Query OK, 0 rows affected (7.94 sec)

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | ac3d6eaf-1a0a-11f1-9efa-000c29f4a60c | mysql-node1 |        3306 | ONLINE       | PRIMARY     | 8.3.0          | XCom                       |
| group_replication_applier | e0b37b20-1a0b-11f1-a62c-000c29e84b64 | mysql-node2 |        3306 | ONLINE       | PRIMARY     | 8.3.0          | XCom                       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
2 rows in set (0.00 sec)

#看到主机online表示成功

3.测试

bash 复制代码
#测试所有节点是否可以执行读写并数据是否同步
#node1中
mysql> create database timinglee;
Query OK, 1 row affected (0.00 sec)

mysql> create table timinglee.userlist (
    -> username VARCHAR(10) PRIMARY KEY NOT NULL,
    -> password VARCHAR(50) NOT NULL
    -> );
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO timinglee.userlist VALUES ('user1','111');
Query OK, 1 row affected (0.01 sec)

#在node2中查看并插入新的数据
mysql> select * from timinglee.userlist;
+----------+----------+
| username | password |
+----------+----------+
| user1    | 111      |
+----------+----------+
1 row in set (0.00 sec)

mysql> insert into timinglee.userlist values ('user2','222');
Query OK, 1 row affected (0.01 sec)

mysql> select * from timinglee.userlist;
+----------+----------+
| username | password |
+----------+----------+
| user1    | 111      |
| user2    | 222      |
+----------+----------+
2 rows in set (0.01 sec)

#在node3中查看并插入数据
mysql> select * from timinglee.userlist;
+----------+----------+
| username | password |
+----------+----------+
| user1    | 111      |
| user2    | 222      |
+----------+----------+
2 rows in set (0.00 sec)


mysql> insert into timinglee.userlist values ('user3','333');
Query OK, 1 row affected (0.01 sec)

mysql> select * from timinglee.userlist;
+----------+----------+
| username | password |
+----------+----------+
| user1    | 111      |
| user2    | 222      |
| user3    | 333      |
+----------+----------+
3 rows in set (0.00 sec)


#在node1和2中也可以看到以上数据

Mysql-MHA 高可用集群

1 MHA概述

为什么要用MHA?

Master 的单点故障问题

什么是 MHA?

  • MHA(Master High Availability)是一套优秀的 MySQL 高可用环境下故障切换和主从复制的软件。

  • MHA 的出现就是解决 MySQL 单点的问题。

  • MySQL 故障切换过程中,MHA 能做到 0-30 秒内自动完成故障切换操作。

  • MHA 能在故障切换的过程中最大程度上保证数据的一致性,以达到真正意义上的高可用。

MHA 的组成

  • MHA 由两部分组成: MHAManager (管理节点) MHA Node (数据库节点),

  • MHA Manager 可以单独部署在一台独立的机器上管理多个 master-slave 集群,也可以部署在一台 slave 节点上。

  • MHA Manager 会定时探测集群中的 master 节点。

  • 当 master 出现故障时,它可以自动将最新数据的 slave 提升为新的 master, 然后将所有其他的 slave 重新指向新的 master。

MHA 的特点

  • 自动故障切换过程中,MHA 从宕机的主服务器上保存二进制日志,最大程度的保证数据不丢失

  • 使用半同步复制,可以大大降低数据丢失的风险,如果只有一个 slave 已经收到了最新的二进制日志,MHA 可以将最新的二进制日志应用于其他所有的 slave 服务器上,因此可以保证所有节点的数据一致性

  • 目前 MHA 支持一主多从架构,最少三台服务,即一主两从

故障切换备选主库的算法

1. 一般判断从库的是从(position/GTID)判断优劣,数据有差异,最接近于 master 的 slave,成为备选主。

2. 数据一致的情况下,按照配置文件顺序,选择备选主库。

3. 设定有权重(candidate_master = 1),按照权重强制指定备选主。

  • 默认情况下如果一个 slave 落后 master 100M 的 relay logs 的话,即使有权重,也会失效。
  • 如果 check_repl_delay = 0 的话,即使落后很多日志,也强制选择其为备选主。

MHA 工作原理

  • 目前 MHA 主要支持一主多从的架构,要搭建 MHA, 要求一个复制集群必须最少有 3 台数据库服务器,一主二从,即一台充当 Master,台充当备用 Master,另一台充当从库。

  • MHA Node 运行在每台 MySQL 服务器上

  • MHAManager 会定时探测集群中的 master 节点

  • 当 master 出现故障时,它可以自动将最新数据的 slave 提升为新的 master

  • 然后将所有其他的 slave 重新指向新的 master,VIP 自动漂移到新的 master。

  • 整个故障转移过程对应用程序完全透明。

2 环境配置

准备工作---保证数据一致性(所有的mysql节点)

重新初始化数据

bash 复制代码
[root@mysql-node1 ~]# /etc/init.d/mysqld stop
[root@mysql-node1 ~]# rm -rf /data/mysql/*
[root@mysql-node1 ~]# mysqld --initialize --user mysql

[root@mysql-node1 ~]# /etc/init.d/mysqld start
Starting MySQL.Logging to '/data/mysql/mysql-node1.err'.
. SUCCESS!

[root@mysql-node1 ~]# mysql_secure_installation


[root@mysql-node1 ~]# mysql -uroot -plee -e "create user lee@'%' identified with mysql_native_password by 'lee';"

[root@mysql-node1 ~]# mysql -uroot -plee -e "GRANT replication slave ON *.* to lee@'%';"

[root@mysql-node1 ~]# mysql -uroot -plee -e "show master status;"
mysql: [Warning] Using a password on the command line interface can be insecure.
+------------------+----------+--------------+------------------+------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000002 |     1328 |              |                  | 65e12da3-1be5-11f1-9a03-000c29679c4e:1-5 |
+------------------+----------+--------------+------------------+------------------------------------------+

重新配置主从

在slave主机中(node2和node3)

bash 复制代码
mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.01 sec)

[root@mysql-node2 ~]# mysql -uroot -plee -e "CHANGE MASTER TO MASTER_HOST='192.168.131.10', MASTER_USER='lee', MASTER_PASSWORD='lee', MASTER_AUTO_POSITION=1328;"
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@mysql-node2 ~]# mysql -uroot -plee -e " start slave;"
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@mysql-node2 ~]# mysql -uroot -plee -e " show slave status\G;"
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 192.168.131.10
                  Master_User: lee
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1328
               Relay_Log_File: mysql-node2-relay-bin.000002
                Relay_Log_Pos: 1537
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

1.在所有主机中安装Mha响应软件

[root@mha ~]# unzip MHA-7.zip

root@mha \~\]# cd MHA-7/ \[root@mha MHA-7\]# dnf install perl perl-DBD-MySQL perl-CPAN \[root@mha MHA-7\]# cpan Loading internal logger. Log::Log4perl recommended for better logging CPAN.pm requires configuration, but most of it can be done automatically. If you answer 'no' below, you will enter an interactive dialog for each configuration option instead. Would you like to configure as much as possible automatically? \[yes\] yes cpan\[1\]\> install Config::Tiny cpan\[2\]\> install Log::Dispatch cpan\[3\]\> install Mail::Sender Specify defaults for Mail::Sender? (y/N) y Default encoding of message bodies (N)one, (Q)uoted-printable, (B)ase64: n cpan\[4\]\> install Parallel::ForkManager cpan\[5\]\>exit 验证组建是否安装成功 ![](https://i-blog.csdnimg.cn/direct/5f5d5d8bc0224e19a311d4e17319e4ad.png) 在mha节点 ```bash [root@mha MHA-7]# rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm mha4mysql-node-0.58-0.el7.centos.noarch.rpm --nodeps ``` ![](https://i-blog.csdnimg.cn/direct/d32ea99fbdd443e0ac85e0f5b8770d59.png) 在所有mysql节点 \[root@mysql-node1-3 \~\]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm --nodeps

2.在slave中安装相应软件

bash 复制代码
[root@mha MHA-7]# for i in 10 20 30
> do
> scp mha4mysql-node-0.58-0.el7.centos.noarch.rpm root@192.168.131.$i:/mnt
> ssh -l root 192.168.131.$i "rpm -ivh /mnt/mha4mysql-node-0.58-0.el7.centos.noarch.rpm --nodeps"
> done
Warning: Permanently added '192.168.131.10' (ED25519) to the list of known hosts.
mha4mysql-node-0.58-0.el7.centos.noarch.rpm                     100%   35KB  16.9MB/s   00:00
Verifying...                          ########################################
准备中...                          ########################################
正在升级/安装...
mha4mysql-node-0.58-0.el7.centos      ########################################
Warning: Permanently added '192.168.131.20' (ED25519) to the list of known hosts.
mha4mysql-node-0.58-0.el7.centos.noarch.rpm                     100%   35KB  28.2MB/s   00:00
Verifying...                          ########################################
准备中...                          ########################################
正在升级/安装...
mha4mysql-node-0.58-0.el7.centos      ########################################
Warning: Permanently added '192.168.131.30' (ED25519) to the list of known hosts.
mha4mysql-node-0.58-0.el7.centos.noarch.rpm                     100%   35KB  20.1MB/s   00:00
Verifying...                          ########################################
准备中...                          ########################################
正在升级/安装...
mha4mysql-node-0.58-0.el7.centos      ########################################

3.修改MHA-Manager中的检测代码

bash 复制代码
199 #sub parse_mysql_major_version($) {
200 #  my $str = shift;
201 #  my $result = sprintf( '%03d%03d', $str =~ m/(\d+)/g );
202 #  return $result;
203 #}
204
205 sub parse_mysql_major_version($) {
206   my $str = shift;
207   my @nums = $str =~ m/(\d+)/g;
208   my $result = sprintf( '%03d%03d', $nums[0]//0, $nums[1]//0);
209   return $result;
210 }

4.为MHA建立远程登录用户

在master主机中

bash 复制代码
mysql> create user root@'%' identified with mysql_native_password by 'lee';
Query OK, 0 rows affected (0.01 sec)

mysql> GRANT ALL ON *.* TO root@'%' ;
Query OK, 0 rows affected (0.00 sec)

5.生产MHA-manager的配置文件模板

bash 复制代码
[root@mha mha4mysql-manager-0.58]# mkdir  /etc/masterha/ -p
[root@mha MHA-7]# tar zxf mha4mysql-manager-0.58.tar.gz
[root@mha MHA-7]# cd mha4mysql-manager-0.58
[root@mha mha4mysql-manager-0.58]# mkdir  /etc/masterha/ -p

[root@mha mha4mysql-manager-0.58]# cat samples/conf/masterha_default.cnf samples/conf/app1.cnf  > /etc/masterha/app1.cnf

6.修改配置文件

bash 复制代码
[root@mha ~]# vim /etc/masterha/app1.cnf
[server default]
user=root
password=lee
ssh_user=root
repl_user=lee
repl_password=lee
master_binlog_dir= /data/mysql
remote_workdir=/tmp
secondary_check_script= masterha_secondary_check -s 192.168.131.10 -s 192.168.131.2
ping_interval=3

# master_ip_failover_script= /script/masterha/master_ip_failover
# shutdown_script= /script/masterha/power_manager
# report_script= /script/masterha/send_report
# master_ip_online_change_script= /script/masterha/master_ip_online_change

[server default]
manager_workdir=/etc/masterha
manager_log=/etc/masterha/mha.log

[server1]
hostname=192.168.131.10
candidate_master=1
check_repl_delay=0

[server2]
hostname=192.168.131.20
candidate_master=1
check_repl_delay=0

[server3]
hostname=192.168.131.30
no_master=1

7.检测环境

root@mha \~\]# masterha_check_ssh --conf=/etc/masterha/app1.cnf ![](https://i-blog.csdnimg.cn/direct/034d1818364e4c6386bf4bacf85cd165.png) \[root@mha \~\]# masterha_check_repl --conf=/etc/masterha/app1.cnf ![](https://i-blog.csdnimg.cn/direct/938a8f3cf53f4999a7479b2acd6c588b.png)

3 集群切换操作

1.手动切换

master无故障切换

bash 复制代码
#执行切换,把master切换到20
[root@mha masterha]# masterha_master_switch --conf=/etc/masterha/app1.cnf   --master_state=alive   --new_master_host=192.168.131.20   --new_master_port=3306   --orig_master_is_new_slave   --running_updates_limit=10000

[root@mha masterha]# masterha_master_switch --conf=/etc/masterha/app1.cnf   --master_state=alive   --new_master_host=192.168.131.20   --new_master_port=3306   --orig_master_is_new_slave   --running_updates_limit=10000
Tue Mar 10 04:11:14 2026 - [info] MHA::MasterRotate version 0.58.
Tue Mar 10 04:11:14 2026 - [info] Starting online master switch..
Tue Mar 10 04:11:14 2026 - [info]
Tue Mar 10 04:11:14 2026 - [info] * Phase 1: Configuration Check Phase..
Tue Mar 10 04:11:14 2026 - [info]
Tue Mar 10 04:11:14 2026 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Mar 10 04:11:14 2026 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Mar 10 04:11:14 2026 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Mar 10 04:11:16 2026 - [info] GTID failover mode = 1
Tue Mar 10 04:11:16 2026 - [info] Current Alive Master: 192.168.131.10(192.168.131.10:3306)
Tue Mar 10 04:11:16 2026 - [info] Alive Slaves:
Tue Mar 10 04:11:16 2026 - [info]   192.168.131.20(192.168.131.20:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Tue Mar 10 04:11:16 2026 - [info]     GTID ON
Tue Mar 10 04:11:16 2026 - [info]     Replicating from 192.168.131.10(192.168.131.10:3306)
Tue Mar 10 04:11:16 2026 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Mar 10 04:11:16 2026 - [info]   192.168.131.30(192.168.131.30:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Tue Mar 10 04:11:16 2026 - [info]     GTID ON
Tue Mar 10 04:11:16 2026 - [info]     Replicating from 192.168.131.10(192.168.131.10:3306)
Tue Mar 10 04:11:16 2026 - [info]     Not candidate for the new Master (no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.131.10(192.168.131.10:3306)? (YES/no): yes    #输入内容
Tue Mar 10 04:11:17 2026 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Tue Mar 10 04:11:17 2026 - [info]  ok.
Tue Mar 10 04:11:17 2026 - [info] Checking MHA is not monitoring or doing failover..
Tue Mar 10 04:11:17 2026 - [info] Checking replication health on 192.168.131.20..
Tue Mar 10 04:11:17 2026 - [info]  ok.
Tue Mar 10 04:11:17 2026 - [info] Checking replication health on 192.168.131.30..
Tue Mar 10 04:11:17 2026 - [info]  ok.
Tue Mar 10 04:11:17 2026 - [info] 192.168.131.20 can be new master.
Tue Mar 10 04:11:17 2026 - [info]
From:
192.168.131.10(192.168.131.10:3306) (current master)
 +--192.168.131.20(192.168.131.20:3306)
 +--192.168.131.30(192.168.131.30:3306)

To:
192.168.131.20(192.168.131.20:3306) (new master)
 +--192.168.131.30(192.168.131.30:3306)
 +--192.168.131.10(192.168.131.10:3306)

Starting master switch from 192.168.131.10(192.168.131.10:3306) to 192.168.131.20(192.168.131.20:3306)? (yes/NO): yes        #输入内容
Tue Mar 10 04:11:18 2026 - [info] Checking whether 192.168.131.20(192.168.131.20:3306) is ok for the new master..
Tue Mar 10 04:11:18 2026 - [info]  ok.
Tue Mar 10 04:11:18 2026 - [info] 192.168.131.10(192.168.131.10:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Tue Mar 10 04:11:18 2026 - [info] 192.168.131.10(192.168.131.10:3306): Resetting slave pointing to the dummy host.
Tue Mar 10 04:11:18 2026 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Mar 10 04:11:18 2026 - [info]
Tue Mar 10 04:11:18 2026 - [info] * Phase 2: Rejecting updates Phase..
Tue Mar 10 04:11:18 2026 - [info]
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): ^CTue Mar 10 04:17:14 2026 - [info] Killing thread 52 on 192.168.131.10(192.168.131.10:3306)..
Tue Mar 10 04:17:14 2026 - [info] ok.
Tue Mar 10 04:17:14 2026 - [info] Killing thread 44 on 192.168.131.20(192.168.131.20:3306)..
Tue Mar 10 04:17:14 2026 - [info] ok.
Tue Mar 10 04:17:14 2026 - [info] Killing thread 41 on 192.168.131.30(192.168.131.30:3306)..
Tue Mar 10 04:17:14 2026 - [info] ok.
[root@mha masterha]# masterha_master_switch --conf=/etc/masterha/app1.cnf   --master_state=alive   --new_master_host=192.168.131.20   --new_master_port=3306   --orig_master_is_new_slave   --running_updates_limit=10000
Tue Mar 10 04:23:41 2026 - [info] MHA::MasterRotate version 0.58.
Tue Mar 10 04:23:41 2026 - [info] Starting online master switch..
Tue Mar 10 04:23:41 2026 - [info]
Tue Mar 10 04:23:41 2026 - [info] * Phase 1: Configuration Check Phase..
Tue Mar 10 04:23:41 2026 - [info]
Tue Mar 10 04:23:41 2026 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Mar 10 04:23:41 2026 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Mar 10 04:23:41 2026 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Mar 10 04:23:42 2026 - [info] GTID failover mode = 1
Tue Mar 10 04:23:42 2026 - [info] Current Alive Master: 192.168.131.10(192.168.131.10:3306)
Tue Mar 10 04:23:42 2026 - [info] Alive Slaves:
Tue Mar 10 04:23:42 2026 - [info]   192.168.131.20(192.168.131.20:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Tue Mar 10 04:23:42 2026 - [info]     GTID ON
Tue Mar 10 04:23:42 2026 - [info]     Replicating from 192.168.131.10(192.168.131.10:3306)
Tue Mar 10 04:23:42 2026 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Mar 10 04:23:42 2026 - [info]   192.168.131.30(192.168.131.30:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Tue Mar 10 04:23:42 2026 - [info]     GTID ON
Tue Mar 10 04:23:42 2026 - [info]     Replicating from 192.168.131.10(192.168.131.10:3306)
Tue Mar 10 04:23:42 2026 - [info]     Not candidate for the new Master (no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.131.10(192.168.131.10:3306)? (YES/no): yes
Tue Mar 10 04:23:45 2026 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Tue Mar 10 04:23:45 2026 - [info]  ok.
Tue Mar 10 04:23:45 2026 - [info] Checking MHA is not monitoring or doing failover..
Tue Mar 10 04:23:45 2026 - [info] Checking replication health on 192.168.131.20..
Tue Mar 10 04:23:45 2026 - [info]  ok.
Tue Mar 10 04:23:45 2026 - [info] Checking replication health on 192.168.131.30..
Tue Mar 10 04:23:45 2026 - [info]  ok.
Tue Mar 10 04:23:45 2026 - [info] 192.168.131.20 can be new master.
Tue Mar 10 04:23:45 2026 - [info]
From:
192.168.131.10(192.168.131.10:3306) (current master)
 +--192.168.131.20(192.168.131.20:3306)
 +--192.168.131.30(192.168.131.30:3306)

To:
192.168.131.20(192.168.131.20:3306) (new master)
 +--192.168.131.30(192.168.131.30:3306)
 +--192.168.131.10(192.168.131.10:3306)

Starting master switch from 192.168.131.10(192.168.131.10:3306) to 192.168.131.20(192.168.131.20:3306)? (yes/NO): yes
Tue Mar 10 04:23:48 2026 - [info] Checking whether 192.168.131.20(192.168.131.20:3306) is ok for the new master..
Tue Mar 10 04:23:48 2026 - [info]  ok.
Tue Mar 10 04:23:48 2026 - [info] 192.168.131.10(192.168.131.10:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Tue Mar 10 04:23:48 2026 - [info] 192.168.131.10(192.168.131.10:3306): Resetting slave pointing to the dummy host.
Tue Mar 10 04:23:48 2026 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Mar 10 04:23:48 2026 - [info]
Tue Mar 10 04:23:48 2026 - [info] * Phase 2: Rejecting updates Phase..
Tue Mar 10 04:23:48 2026 - [info]
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Tue Mar 10 04:23:52 2026 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Tue Mar 10 04:23:52 2026 - [info] Executing FLUSH TABLES WITH READ LOCK..
Tue Mar 10 04:23:52 2026 - [info]  ok.
Tue Mar 10 04:23:52 2026 - [info] Orig master binlog:pos is mysql-bin.000002:1828.
Tue Mar 10 04:23:52 2026 - [info]  Waiting to execute all relay logs on 192.168.131.20(192.168.131.20:3306)..
Tue Mar 10 04:23:52 2026 - [info]  master_pos_wait(mysql-bin.000002:1828) completed on 192.168.131.20(192.168.131.20:3306). Executed 0 events.
Tue Mar 10 04:23:52 2026 - [info]   done.
Tue Mar 10 04:23:52 2026 - [info] Getting new master's binlog name and position..
Tue Mar 10 04:23:52 2026 - [info]  mysql-bin.000004:2661
Tue Mar 10 04:23:52 2026 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.131.20', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='lee', MASTER_PASSWORD='xxx';
Tue Mar 10 04:23:52 2026 - [info]
Tue Mar 10 04:23:52 2026 - [info] * Switching slaves in parallel..
Tue Mar 10 04:23:52 2026 - [info]
Tue Mar 10 04:23:52 2026 - [info] -- Slave switch on host 192.168.131.30(192.168.131.30:3306) started, pid: 2718
Tue Mar 10 04:23:52 2026 - [info]
Tue Mar 10 04:24:03 2026 - [info] Log messages from 192.168.131.30 ...
Tue Mar 10 04:24:03 2026 - [info]
Tue Mar 10 04:23:52 2026 - [info]  Waiting to execute all relay logs on 192.168.131.30(192.168.131.30:3306)..
Tue Mar 10 04:23:52 2026 - [info]  master_pos_wait(mysql-bin.000002:1828) completed on 192.168.131.30(192.168.131.30:3306). Executed 0 events.
Tue Mar 10 04:23:52 2026 - [info]   done.
Tue Mar 10 04:23:52 2026 - [info]  Resetting slave 192.168.131.30(192.168.131.30:3306) and starting replication from the new master 192.168.131.20(192.168.131.20:3306)..
Tue Mar 10 04:23:52 2026 - [info]  Executed CHANGE MASTER.
Tue Mar 10 04:24:02 2026 - [info]  Slave started.
Tue Mar 10 04:24:03 2026 - [info] End of log messages from 192.168.131.30 ...
Tue Mar 10 04:24:03 2026 - [info]
Tue Mar 10 04:24:03 2026 - [info] -- Slave switch on host 192.168.131.30(192.168.131.30:3306) succeeded.
Tue Mar 10 04:24:03 2026 - [info] Unlocking all tables on the orig master:
Tue Mar 10 04:24:03 2026 - [info] Executing UNLOCK TABLES..
Tue Mar 10 04:24:03 2026 - [info]  ok.
Tue Mar 10 04:24:03 2026 - [info] Starting orig master as a new slave..
Tue Mar 10 04:24:03 2026 - [info]  Resetting slave 192.168.131.10(192.168.131.10:3306) and starting replication from the new master 192.168.131.20(192.168.131.20:3306)..
Tue Mar 10 04:24:03 2026 - [info]  Executed CHANGE MASTER.
Tue Mar 10 04:24:04 2026 - [info]  Slave started.
Tue Mar 10 04:24:04 2026 - [info] All new slave servers switched successfully.
Tue Mar 10 04:24:04 2026 - [info]
Tue Mar 10 04:24:04 2026 - [info] * Phase 5: New master cleanup phase..
Tue Mar 10 04:24:04 2026 - [info]
Tue Mar 10 04:24:04 2026 - [info]  192.168.131.20: Resetting slave info succeeded.
Tue Mar 10 04:24:04 2026 - [info] Switching master to 192.168.131.20(192.168.131.20:3306) completed successfully.

查看集群状态

bash 复制代码
[root@mysql-node1 ~]# mysql -uroot -plee
mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 192.168.131.20
                  Master_User: lee
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 2661
               Relay_Log_File: mysql-node1-relay-bin.000004
                Relay_Log_Pos: 864
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes


[root@mysql-node3 ~]# mysql -uroot -plee -e "show slave status\G;"  | head -n 15
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 192.168.131.20
                  Master_User: lee
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 2661
               Relay_Log_File: mysql-node3-relay-bin.000004
                Relay_Log_Pos: 864
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

master故障后切换

#模拟master故障

root@mysql-node10 \~\]# /etc/init.d/mysqld stop ```bash #在MHA-master中做故障切换 [root@mha ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.131.10 --dead_master_port=3306 --new_master_host=192.168.131.20 --new_master_port=3306 --ignore_last_failover --dead_master_ip= is not set. Using 192.168.131.10. Tue Mar 10 17:38:03 2026 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Mar 10 17:38:03 2026 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Mar 10 17:38:03 2026 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Mar 10 17:38:03 2026 - [info] MHA::MasterFailover version 0.58. Tue Mar 10 17:38:03 2026 - [info] Starting master failover. Tue Mar 10 17:38:03 2026 - [info] Tue Mar 10 17:38:03 2026 - [info] * Phase 1: Configuration Check Phase.. Tue Mar 10 17:38:03 2026 - [info] Tue Mar 10 17:38:04 2026 - [info] GTID failover mode = 1 Tue Mar 10 17:38:04 2026 - [info] Dead Servers: Tue Mar 10 17:38:04 2026 - [info] 192.168.131.10(192.168.131.10:3306) Tue Mar 10 17:38:04 2026 - [info] Checking master reachability via MySQL(double check)... Tue Mar 10 17:38:04 2026 - [info] ok. Tue Mar 10 17:38:04 2026 - [info] Alive Servers: Tue Mar 10 17:38:04 2026 - [info] 192.168.131.20(192.168.131.20:3306) Tue Mar 10 17:38:04 2026 - [info] 192.168.131.30(192.168.131.30:3306) Tue Mar 10 17:38:04 2026 - [info] Alive Slaves: Tue Mar 10 17:38:04 2026 - [info] 192.168.131.20(192.168.131.20:3306) Version=8.3.0 (oldest major version between slaves) log-bin:enabled Tue Mar 10 17:38:04 2026 - [info] GTID ON Tue Mar 10 17:38:04 2026 - [info] Replicating from 192.168.131.10(192.168.131.10:3306) Tue Mar 10 17:38:04 2026 - [info] Primary candidate for the new Master (candidate_master is set) Tue Mar 10 17:38:04 2026 - [info] 192.168.131.30(192.168.131.30:3306) Version=8.3.0 (oldest major version between slaves) log-bin:enabled Tue Mar 10 17:38:04 2026 - [info] GTID ON Tue Mar 10 17:38:04 2026 - [info] Replicating from 192.168.131.10(192.168.131.10:3306) Tue Mar 10 17:38:04 2026 - [info] Not candidate for the new Master (no_master is set) Master 192.168.131.10(192.168.131.10:3306) is dead. Proceed? (yes/NO): yes Tue Mar 10 17:38:18 2026 - [info] Starting GTID based failover. Tue Mar 10 17:38:18 2026 - [info] Tue Mar 10 17:38:18 2026 - [info] ** Phase 1: Configuration Check Phase completed. Tue Mar 10 17:38:18 2026 - [info] Tue Mar 10 17:38:18 2026 - [info] * Phase 2: Dead Master Shutdown Phase.. Tue Mar 10 17:38:18 2026 - [info] Tue Mar 10 17:38:19 2026 - [info] HealthCheck: SSH to 192.168.131.10 is reachable. Tue Mar 10 17:38:19 2026 - [info] Forcing shutdown so that applications never connect to the current master.. Tue Mar 10 17:38:19 2026 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address. Tue Mar 10 17:38:19 2026 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Tue Mar 10 17:38:19 2026 - [info] * Phase 2: Dead Master Shutdown Phase completed. Tue Mar 10 17:38:19 2026 - [info] Tue Mar 10 17:38:19 2026 - [info] * Phase 3: Master Recovery Phase.. Tue Mar 10 17:38:19 2026 - [info] Tue Mar 10 17:38:19 2026 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Tue Mar 10 17:38:19 2026 - [info] Tue Mar 10 17:38:19 2026 - [info] The latest binary log file/position on all slaves is mysql-bin.000006:238 Tue Mar 10 17:38:19 2026 - [info] Latest slaves (Slaves that received relay log files to the latest): Tue Mar 10 17:38:19 2026 - [info] 192.168.131.20(192.168.131.20:3306) Version=8.3.0 (oldest major version between slaves) log-bin:enabled Tue Mar 10 17:38:19 2026 - [info] GTID ON Tue Mar 10 17:38:19 2026 - [info] Replicating from 192.168.131.10(192.168.131.10:3306) Tue Mar 10 17:38:19 2026 - [info] Primary candidate for the new Master (candidate_master is set) Tue Mar 10 17:38:19 2026 - [info] 192.168.131.30(192.168.131.30:3306) Version=8.3.0 (oldest major version between slaves) log-bin:enabled Tue Mar 10 17:38:19 2026 - [info] GTID ON Tue Mar 10 17:38:19 2026 - [info] Replicating from 192.168.131.10(192.168.131.10:3306) Tue Mar 10 17:38:19 2026 - [info] Not candidate for the new Master (no_master is set) Tue Mar 10 17:38:19 2026 - [info] The oldest binary log file/position on all slaves is mysql-bin.000006:238 Tue Mar 10 17:38:19 2026 - [info] Oldest slaves: Tue Mar 10 17:38:19 2026 - [info] 192.168.131.20(192.168.131.20:3306) Version=8.3.0 (oldest major version between slaves) log-bin:enabled Tue Mar 10 17:38:19 2026 - [info] GTID ON Tue Mar 10 17:38:19 2026 - [info] Replicating from 192.168.131.10(192.168.131.10:3306) Tue Mar 10 17:38:19 2026 - [info] Primary candidate for the new Master (candidate_master is set) Tue Mar 10 17:38:19 2026 - [info] 192.168.131.30(192.168.131.30:3306) Version=8.3.0 (oldest major version between slaves) log-bin:enabled Tue Mar 10 17:38:19 2026 - [info] GTID ON Tue Mar 10 17:38:19 2026 - [info] Replicating from 192.168.131.10(192.168.131.10:3306) Tue Mar 10 17:38:19 2026 - [info] Not candidate for the new Master (no_master is set) Tue Mar 10 17:38:19 2026 - [info] Tue Mar 10 17:38:19 2026 - [info] * Phase 3.3: Determining New Master Phase.. Tue Mar 10 17:38:19 2026 - [info] Tue Mar 10 17:38:19 2026 - [info] 192.168.131.20 can be new master. Tue Mar 10 17:38:19 2026 - [info] New master is 192.168.131.20(192.168.131.20:3306) Tue Mar 10 17:38:19 2026 - [info] Starting master failover.. Tue Mar 10 17:38:19 2026 - [info] From: 192.168.131.10(192.168.131.10:3306) (current master) +--192.168.131.20(192.168.131.20:3306) +--192.168.131.30(192.168.131.30:3306) To: 192.168.131.20(192.168.131.20:3306) (new master) +--192.168.131.30(192.168.131.30:3306) Starting master switch from 192.168.131.10(192.168.131.10:3306) to 192.168.131.20(192.168.131.20:3306)? (yes/NO): yes Tue Mar 10 17:38:31 2026 - [info] New master decided manually is 192.168.131.20(192.168.131.20:3306) Tue Mar 10 17:38:31 2026 - [info] Tue Mar 10 17:38:31 2026 - [info] * Phase 3.3: New Master Recovery Phase.. Tue Mar 10 17:38:31 2026 - [info] Tue Mar 10 17:38:31 2026 - [info] Waiting all logs to be applied.. Tue Mar 10 17:38:31 2026 - [info] done. Tue Mar 10 17:38:31 2026 - [info] Getting new master's binlog name and position.. Tue Mar 10 17:38:31 2026 - [info] mysql-bin.000009:278 Tue Mar 10 17:38:31 2026 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.131.20', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='lee', MASTER_PASSWORD='xxx'; Tue Mar 10 17:38:31 2026 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000009, 278, 4df94f88-1bcb-11f1-9c87-000c2945485e:1-5, 65e12da3-1be5-11f1-9a03-000c29679c4e:1-7, af08b2f7-1bc5-11f1-bca0-000c29679c4e:6-11 Tue Mar 10 17:38:31 2026 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address. Tue Mar 10 17:38:31 2026 - [info] Setting read_only=0 on 192.168.131.20(192.168.131.20:3306).. Tue Mar 10 17:38:31 2026 - [info] ok. Tue Mar 10 17:38:31 2026 - [info] ** Finished master recovery successfully. Tue Mar 10 17:38:31 2026 - [info] * Phase 3: Master Recovery Phase completed. Tue Mar 10 17:38:31 2026 - [info] Tue Mar 10 17:38:31 2026 - [info] * Phase 4: Slaves Recovery Phase.. Tue Mar 10 17:38:31 2026 - [info] Tue Mar 10 17:38:31 2026 - [info] Tue Mar 10 17:38:31 2026 - [info] * Phase 4.1: Starting Slaves in parallel.. Tue Mar 10 17:38:31 2026 - [info] Tue Mar 10 17:38:31 2026 - [info] -- Slave recovery on host 192.168.131.30(192.168.131.30:3306) started, pid: 1693. Check tmp log /etc/masterha/192.168.131.30_3306_20260310173803.log if it takes time.. Tue Mar 10 17:38:42 2026 - [info] Tue Mar 10 17:38:42 2026 - [info] Log messages from 192.168.131.30 ... Tue Mar 10 17:38:42 2026 - [info] Tue Mar 10 17:38:31 2026 - [info] Resetting slave 192.168.131.30(192.168.131.30:3306) and starting replication from the new master 192.168.131.20(192.168.131.20:3306).. Tue Mar 10 17:38:31 2026 - [info] Executed CHANGE MASTER. Tue Mar 10 17:38:41 2026 - [info] Slave started. Tue Mar 10 17:38:41 2026 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln974] gtid_wait(4df94f88-1bcb-11f1-9c87-000c2945485e:1-5, 65e12da3-1be5-11f1-9a03-000c29679c4e:1-7, af08b2f7-1bc5-11f1-bca0-000c29679c4e:6-11) returned NULL on 192.168.131.30(192.168.131.30:3306). Maybe SQL thread was aborted? Tue Mar 10 17:38:42 2026 - [info] End of log messages from 192.168.131.30. Tue Mar 10 17:38:42 2026 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln2045] Master failover to 192.168.131.20(192.168.131.20:3306) done, but recovery on slave partially failed. Tue Mar 10 17:38:42 2026 - [info] ----- Failover Report ----- app1: MySQL Master failover 192.168.131.10(192.168.131.10:3306) to 192.168.131.20(192.168.131.20:3306) Master 192.168.131.10(192.168.131.10:3306) is down! Check MHA Manager logs at mha for details. Started manual(interactive) failover. Selected 192.168.131.20(192.168.131.20:3306) as a new master. 192.168.131.20(192.168.131.20:3306): OK: Applying all logs succeeded. 192.168.131.30(192.168.131.30:3306): ERROR: Failed on waiting gtid exec set on master. Master failover to 192.168.131.20(192.168.131.20:3306) done, but recovery on slave partially failed. ``` 查看切换信息 ![](https://i-blog.csdnimg.cn/direct/a50d311f383345ecb1db94dfe0bc76d3.png)

恢复故障mysql节点

bash 复制代码
[root@mysql-node1 ~]# /etc/init.d/mysqld start

[root@mysql-node2 ~]# change master to MASTER_HOST='192.168.131.10',MASTER_USER='lee',MASTER_PASSWORD='lee',MASTER_AUTO_POSITION=1;
[root@mysql-node3 ~]# change master to MASTER_HOST='192.168.131.10',MASTER_USER='lee',MASTER_PASSWORD='lee',MASTER_AUTO_POSITION=1;

测试一主两从是否正常

bash 复制代码
[root@mha ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf                                Tue Mar 10 17:59:03 2026 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Mar 10 17:59:03 2026 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Mar 10 17:59:03 2026 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Mar 10 17:59:03 2026 - [info] MHA::MasterMonitor version 0.58.
Tue Mar 10 17:59:04 2026 - [info] GTID failover mode = 1
Tue Mar 10 17:59:04 2026 - [info] Dead Servers:
Tue Mar 10 17:59:04 2026 - [info] Alive Servers:
Tue Mar 10 17:59:04 2026 - [info]   192.168.131.10(192.168.131.10:3306)
Tue Mar 10 17:59:04 2026 - [info]   192.168.131.20(192.168.131.20:3306)
Tue Mar 10 17:59:04 2026 - [info]   192.168.131.30(192.168.131.30:3306)
Tue Mar 10 17:59:04 2026 - [info] Alive Slaves:
Tue Mar 10 17:59:04 2026 - [info]   192.168.131.20(192.168.131.20:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Tue Mar 10 17:59:04 2026 - [info]     GTID ON
Tue Mar 10 17:59:04 2026 - [info]     Replicating from 192.168.131.10(192.168.131.10:3306)
Tue Mar 10 17:59:04 2026 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Mar 10 17:59:04 2026 - [info]   192.168.131.30(192.168.131.30:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Tue Mar 10 17:59:04 2026 - [info]     GTID ON
Tue Mar 10 17:59:04 2026 - [info]     Replicating from 192.168.131.10(192.168.131.10:3306)
Tue Mar 10 17:59:04 2026 - [info]     Not candidate for the new Master (no_master is set)
Tue Mar 10 17:59:04 2026 - [info] Current Alive Master: 192.168.131.10(192.168.131.10:3306)
Tue Mar 10 17:59:04 2026 - [info] Checking slave configurations..
Tue Mar 10 17:59:04 2026 - [info]  read_only=1 is not set on slave 192.168.131.20(192.168.131.20:3306).
Tue Mar 10 17:59:04 2026 - [info]  read_only=1 is not set on slave 192.168.131.30(192.168.131.30:3306).
Tue Mar 10 17:59:04 2026 - [info] Checking replication filtering settings..
Tue Mar 10 17:59:04 2026 - [info]  binlog_do_db= , binlog_ignore_db=
Tue Mar 10 17:59:04 2026 - [info]  Replication filtering check ok.
Tue Mar 10 17:59:04 2026 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Tue Mar 10 17:59:04 2026 - [info] Checking SSH publickey authentication settings on the current master..
Tue Mar 10 17:59:04 2026 - [info] HealthCheck: SSH to 192.168.131.10 is reachable.
Tue Mar 10 17:59:04 2026 - [info]
192.168.131.10(192.168.131.10:3306) (current master)
 +--192.168.131.20(192.168.131.20:3306)
 +--192.168.131.30(192.168.131.30:3306)

Tue Mar 10 17:59:04 2026 - [info] Checking replication health on 192.168.131.20..
Tue Mar 10 17:59:04 2026 - [info]  ok.
Tue Mar 10 17:59:04 2026 - [info] Checking replication health on 192.168.131.30..
Tue Mar 10 17:59:04 2026 - [info]  ok.
Tue Mar 10 17:59:04 2026 - [warning] master_ip_failover_script is not defined.
Tue Mar 10 17:59:04 2026 - [warning] shutdown_script is not defined.
Tue Mar 10 17:59:04 2026 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

自动切换

bash 复制代码
[root@mha ~]# rm -rf /etc/masterha/app1.failover.complete        #删掉切换锁文件

#监控程序通过指定配置文件监控master状态,当master出问题后自动切换并退出避免重复做故障切换

#为了方便观察建议开启两个shell
[root@mha ~]# > /etc/masterha/*.log    #清空 MHA 日志文件
[root@mha ~]# watch -n 1 cat /etc/masterha/mha.log

[root@mha ~]# masterha_manager --conf=/etc/masterha/app1.cnf  &

4 vip功能及vip的启动切换

bash 复制代码
[root@mha ~]# mkdir  /etc/masterha/scripts
[root@mha ~]# cp  MHA-7/master_ip_*  /etc/masterha/scripts
[root@mha ~]# chmod  +x MHA-7/master_ip_*

[root@mha ~]# vim /etc/masterha/app1.cnf
master_ip_failover_script= /etc/masterha/scripts/master_ip_failover
master_ip_online_change_script= /etc/masterha/scripts/master_ip_online_change

[root@mha ~]# vim /etc/masterha/scripts/master_ip_failover
my $vip = '192.168.131.100/24';

[root@mha ~]# vim /etc/masterha/scripts/master_ip_online_change
my $vip = '192.168.131.100/24';

[root@mysql-node1 ~]# ip a a 192.168.131.100/24 dev eth0

模拟故障:

bash 复制代码
[root@mysql-node10 ~]# /etc/init.d/mysqld stop			#关闭主节点服务

测试:

手动切换后查看 vip 变化

bash 复制代码
[root@mysql-node2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:45:48:5e brd ff:ff:ff:ff:ff:ff
    altname enp3s0
    altname ens160
    inet 192.168.131.20/24 brd 192.168.131.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 192.168.131.100/24 scope global secondary eth0
    inet6 fe80::20c:29ff:fe45:485e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

恢复故障主机

bash 复制代码
[root@mysql-node20 mysql]# /etc/init.d/mysqld start

mysql> change master to MASTER_HOST='192.168.131.10',MASTER_USER='lee',MASTER_PASSWORD='lee',MASTER_AUTO_POSITION=1;
相关推荐
匀泪2 小时前
云原生(Mysql-MHA高可用集群)
mysql·云原生
茉莉玫瑰花茶2 小时前
C++ ORM 实战:ODB 框架全解析(Linux + MySQL)
jvm·数据库·oracle
chushiyunen2 小时前
django日志使用笔记
数据库·笔记·django
听雪楼主.2 小时前
某客户核心业务系统报ORA-600错误分析处理
数据库·oracle
威联通安全存储2 小时前
严谨性的数字基石:某精密医疗器械企业基于威联通的数据治理实践
运维·数据库·python
不剪发的Tony老师2 小时前
DbPaw:一款AI驱动的现代化数据库开发工具
数据库
2301_767902642 小时前
mysql备份
数据库·mysql·adb
剩下了什么2 小时前
Redis 密码设置
数据库·redis·缓存
NineData2 小时前
NineData 社区版 V4.10.0 正式发布
数据库·mysql·代码规范