云原生(Mysql-MHA高可用集群)

实验简介

一、实验背景与核心目标

MHA 是一款开源的 MySQL 高可用解决方案,核心能力是监控主库状态,并在主库故障时自动 / 手动将从库提升为新主库,同时调整其他从库的复制关系,最小化数据丢失和服务中断时间。本次实验的核心目标:

  1. 搭建 MHA 集群环境(1 个 MHA 管理节点 + 多台 MySQL 节点);
  2. 验证 MHA 环境的连通性和复制健康状态;
  3. 实现主库无故障手动切换、主库故障后手动切换、主库故障自动切换;
  4. 故障恢复后重新纳入集群,恢复完整的高可用架构。

二、实验整体架构

  • 节点规划
    • MHA 管理节点:部署 MHA Manager 工具,负责集群监控、切换决策;
    • MySQL 节点:至少 3 台(1 主 2 从,示例中为 172.25.254.10/20/30),部署 MHA Node 工具(提供切换所需的脚本 / 命令支持),并配置 GTID 主从复制。
  • 网络与权限:所有节点免密 SSH 互通,MySQL 配置复制用户、MHA 管理用户(高权限)。

三、实验核心步骤拆解

1. 环境准备阶段(基础配置)

核心是保障 MySQL 主从复制的一致性,为 MHA 搭建铺垫:

  • MySQL 节点初始化 :重置 MySQL 数据目录、初始化实例、配置 root 远程访问,创建复制用户(lee)并授权replication slave权限;
  • 主从复制配置:从库(node2/node3)配置指向主库(node1)的 GTID 复制,确保主从数据一致;
  • MHA 依赖安装 :所有节点安装 Perl 依赖(Config::Tiny/Log::Dispatch等),MHA 管理节点安装mha4mysql-manager,所有 MySQL 节点安装mha4mysql-node(MHA 切换的核心依赖);
  • MHA 代码适配 :修改 MHA 的 Perl 检测脚本(NodeUtil.pm),适配 MySQL 8.x 版本的版本号解析逻辑(解决高版本 MySQL 兼容问题)。
2. MHA 集群配置与验证
  • 配置文件编写 :在 MHA 管理节点创建app1.cnf,定义集群基础参数(MySQL 用户 / 密码、SSH 用户、复制用户)、各节点角色(主库候选、非主库候选)、监控频率(ping_interval=3 秒);
  • 环境验证
    • masterha_check_ssh:验证所有节点间 SSH 免密连通性(MHA 切换依赖 SSH 执行远程命令);
    • masterha_check_repl:验证 MySQL 主从复制健康状态(从库是否在线、复制是否正常、权限是否满足)。
3. 主从切换实验(核心环节)
(1)手动切换(主库无故障)
  • 场景:主库正常运行,需手动将主库切换到指定从库(如从 node1 切换到 node2);
  • 操作 :执行masterha_master_switch命令,指定--master_state=alive(主库存活)、新主库 IP;
  • 核心流程
    1. 锁定原主库(只读),防止切换过程中写入新数据;
    2. 等待新主库应用完所有中继日志,保证数据最新;
    3. 调整其他从库(node3)和原主库(node1)的复制关系,指向新主库(node2);
    4. 解锁原主库,使其作为新从库加入集群;
  • 验证 :查看所有从库的show slave status,确认主库已切换为 node2,复制状态正常。
(2)手动切换(主库故障)
  • 场景:模拟主库(node1)宕机(停止 mysqld 服务),手动触发切换;
  • 操作 :执行masterha_master_switch命令,指定--master_state=dead(主库故障)、故障主库 IP、新主库 IP;
  • 核心流程
    1. 检测主库故障(SSH/MySQL 连接失败);
    2. 关闭故障主库的 VIP(虚拟 IP,若配置),防止业务连接失效主库;
    3. 选择数据最新的从库(node2)提升为新主库;
    4. 调整其他从库(node3)复制关系指向新主库;
    5. 故障恢复:删除 MHA 切换锁文件(app1.failover.complete),重启故障主库并配置为新主库的从库;
  • 注意:故障切换后会生成锁文件,需手动删除才能再次触发切换。
(3)自动切换
  • 场景:MHA 管理节点持续监控主库,主库故障时自动触发切换,无需人工干预;
  • 操作
    1. 启动masterha_manager(MHA 监控进程),后台运行并输出日志;
    2. 模拟主库故障(停止 node1 的 mysqld 服务);
  • 核心流程:MHA 监控进程检测到主库心跳丢失(ping_interval=3 秒),自动执行 "故障主库判定→新主库选举→从库复制调整→VIP 漂移",全程无需人工操作;
  • 验证:通过实时监控 MHA 日志,观察切换流程自动完成,新主库正常提供服务。

四、实验关键细节与注意事项

  1. GTID 复制 :实验基于 MySQL GTID 复制(MASTER_AUTO_POSITION=1),相比传统基于日志文件 / 位置的复制,更易实现精准切换,减少数据丢失;
  2. 依赖兼容 :MHA 0.58 版本对 MySQL 8.x 的版本解析存在问题,需修改NodeUtil.pm脚本适配;
  3. 锁文件机制 :故障切换后生成的app1.failover.complete锁文件,防止重复切换,恢复时需手动删除;
  4. VIP 漂移:实验中提及 VIP(172.25.254.100),切换时通过脚本迁移 VIP 到新主库,保证业务访问 IP 不变(核心高可用体验);
  5. 数据一致性 :切换前执行FLUSH TABLES WITH READ LOCK锁定原主库,最大程度保证数据不丢失(无故障切换场景)。

五、实验价值

  1. 实战性:完整覆盖 MHA 从搭建到切换、恢复的全流程,贴合生产环境高可用架构设计;
  2. 核心能力验证:验证了 MHA 的核心价值 ------"秒级切换""最小化数据丢失""自动 / 手动灵活切换";
  3. 问题适配:解决了 MHA 与 MySQL 8.x 的兼容问题,对高版本 MySQL 部署 MHA 有参考意义;
  4. 故障恢复:明确了故障切换后原主库的归并流程,保障集群长期可用。

Mysql-MHA高可用集群环境配置

准备工作-保证数据一致性(所有的mysql节点)

复制代码
#重新初始化数据
[root@mysql-node1 ~]# /etc/init.d/mysqld stop
 ERROR! MySQL server PID file could not be found!
[root@mysql-node1 ~]# rm -fr /data/mysql/*
[root@mysql-node1 ~]# vi /etc/my.cnf
[mysqld]
user = mysql
datadir = /data/mysql
pid-file = /data/mysql/mysql-node2.pid
socket = /tmp/mysql.sock
port = 3306
character-set-server = utf8mb4
# 注释所有复制/半同步/旧版参数(初始化阶段禁用)
# rpl_semi_sync_master_enabled=1
# rpl_semi_sync_slave_enabled=1
# plugin-load-add=rpl_semi_sync_master.so
# plugin-load-add=rpl_semi_sync_slave.so
# slave_parallel_type=LOGICAL_CLOCK  
# slave_parallel_workers=4           
# log_bin=mysql-bin
# server_id=2
[root@mysql-node1 ~]# mkdir -p /data/mysql
[root@mysql-node1 ~]# chown -R mysql:mysql /data/mysql
[root@mysql-node1 ~]# chmod -R 700 /data/mysql
[root@mysql-node1 ~]# ls -la /data/mysql/
总用量 0
drwx------ 2 mysql mysql  6  3月  7 11:20 .
drwxr-xr-x 3 root  root  19  2月 26 19:23 ..
[root@mysql-node1 ~]# /usr/local/mysql/bin/mysqld --initialize --user=mysql --datadir=/data/mysql
2026-03-07T03:24:50.798059Z 0 [System] [MY-015017] [Server] MySQL Server Initialization - start.
2026-03-07T03:24:50.801819Z 0 [Warning] [MY-011070] [Server] 'Disabling symbolic links using --skip-symbolic-links (or equivalent) is the default. Consider not using this option as it' is deprecated and will be removed in a future release.
2026-03-07T03:24:50.802408Z 0 [System] [MY-013169] [Server] /usr/local/mysql/bin/mysqld (mysqld 8.3.0) initializing of server in progress as process 3540
2026-03-07T03:24:50.824830Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2026-03-07T03:24:51.126557Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2026-03-07T03:24:51.646339Z 6 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: w0in<:tyErdH
2026-03-07T03:24:52.630458Z 0 [System] [MY-015018] [Server] MySQL Server Initialization - end.
[root@mysql-node1 ~]# /etc/init.d/mysqld start
Starting MySQL.Logging to '/data/mysql/mysql-node3.err'.
. SUCCESS! 
[root@mysql-node3 ~]# mysql_secure_installation

Securing the MySQL server deployment.

Enter password for user root: 

The existing password for the user account root has expired. Please set a new password.

New password: 

Re-enter new password: 

VALIDATE PASSWORD COMPONENT can be used to test passwords
and improve security. It checks the strength of password
and allows the users to set only those passwords which are
secure enough. Would you like to setup VALIDATE PASSWORD component?

Press y|Y for Yes, any other key for No: no
Using existing password for root.
Change the password for root ? ((Press y|Y for Yes, any other key for No) : no

 ... skipping.
By default, a MySQL installation has an anonymous user,
allowing anyone to log into MySQL without having to have
a user account created for them. This is intended only for
testing, and to make the installation go a bit smoother.
You should remove them before moving into a production
environment.

Remove anonymous users? (Press y|Y for Yes, any other key for No) : 

 ... skipping.


Normally, root should only be allowed to connect from
'localhost'. This ensures that someone cannot guess at
the root password from the network.

Disallow root login remotely? (Press y|Y for Yes, any other key for No) : 

 ... skipping.
By default, MySQL comes with a database named 'test' that
anyone can access. This is also intended only for testing,
and should be removed before moving into a production
environment.


Remove test database and access to it? (Press y|Y for Yes, any other key for No) : 

 ... skipping.
Reloading the privilege tables will ensure that all changes
made so far will take effect immediately.

Reload privilege tables now? (Press y|Y for Yes, any other key for No) : 

 ... skipping.
All done! 
[root@mysql-node1 ~]# mysql -uroot -plee -e "create user lee@'%' identified with mysql_native_password by 'lee';"
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@mysql-node1 ~]# mysql -uroot -plee -e "GRANT replication slave ON *.* to lee@'%';"
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@mysql-node1 ~]# mysql -uroot -plee -e "show master status;"
mysql: [Warning] Using a password on the command line interface can be insecure.
+------------------+----------+--------------+------------------+------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                        |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000002 |      978 |              |                  | 33832f38-19d5-11f1-ab15-000c29a83238:1-3 |
+------------------+----------+--------------+------------------+------------------------------------------+

#重新配置主从 在slave主机中
[root@mysql-node2 ~]# mysql -uroot -plee -e "CHANGE MASTER TO MASTER_HOST='172.25.254.10', MASTER_USER='lee', MASTER_PASSWORD='lee', MASTER_AUTO_POSITION=1;"
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@mysql-node3 ~]# mysql -uroot -plee -e "CHANGE MASTER TO MASTER_HOST='172.25.254.10', MASTER_USER='lee', MASTER_PASSWORD='lee', MASTER_AUTO_POSITION=1;"
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@mysql-node2 ~]# vi /etc/my.cnf
rpl_semi_sync_master_enabled=1

[root@mysql-node2 ~]# mysql -uroot -p << EOF
# 第一步:先停止所有复制线程(核心前置操作)
STOP REPLICA;
# 第二步:清空旧复制配置
RESET REPLICA ALL;
# 第三步:重新配置主从复制(GTID 自动同步)
CHANGE REPLICATION SOURCE TO
SOURCE_HOST='172.25.254.10',
SOURCE_USER='lee',
SOURCE_PASSWORD='lee',
SOURCE_AUTO_POSITION=1;
# 第四步:启动复制
START REPLICA;
# 第五步:查看复制状态
SHOW REPLICA STATUS\G;
EOF
Enter password: 
*************************** 1. row ***************************
             Replica_IO_State: Checking source version
                  Source_Host: 172.25.254.10
                  Source_User: lee
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: 
          Read_Source_Log_Pos: 4
               Relay_Log_File: mysql-node3-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Source_Log_File: 
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes

在所有主机中安装Mha相应软件

复制代码
[root@mha MHA-7]# unzip MHA-7.zip
Archive:  MHA-7.zip
   creating: MHA-7/
  inflating: MHA-7/master_ip_failover  
  inflating: MHA-7/master_ip_online_change  
  inflating: MHA-7/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm  
  inflating: MHA-7/mha4mysql-manager-0.58.tar.gz  
  inflating: MHA-7/mha4mysql-node-0.58-0.el7.centos.noarch.rpm  
  inflating: MHA-7/perl-Config-Tiny-2.14-7.el7.noarch.rpm  
  inflating: MHA-7/perl-Email-Date-Format-1.002-15.el7.noarch.rpm  
  inflating: MHA-7/perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm  
  inflating: MHA-7/perl-Mail-Sender-0.8.23-1.el7.noarch.rpm  
  inflating: MHA-7/perl-Mail-Sendmail-0.79-21.el7.noarch.rpm  
  inflating: MHA-7/perl-MIME-Lite-3.030-1.el7.noarch.rpm  
  inflating: MHA-7/perl-MIME-Types-1.38-2.el7.noarch.rpm  
  inflating: MHA-7/perl-Net-Telnet-3.03-19.el7.noarch.rpm  
  inflating: MHA-7/perl-Parallel-ForkManager-1.18-2.el7.noarch.rpm  
[root@mha MHA-7]# cd MHA-7/
[root@mha MHA-7]# dnf install perl perl-DBD-MySQL perl-CPAN  -y
[root@mha MHA-7]# cpan
Loading internal logger. Log::Log4perl recommended for better logging

CPAN.pm requires configuration, but most of it can be done automatically.
If you answer 'no' below, you will enter an interactive dialog for each
configuration option instead.

Would you like to configure as much as possible automatically? [yes] yes

Perl site library directory "/usr/local/share/perl5/5.32" does not exist.
Perl site library directory "/usr/local/share/perl5/5.32" created.
Perl site library directory "/usr/local/lib64/perl5/5.32" does not exist.
Perl site library directory "/usr/local/lib64/perl5/5.32" created.
We initialized your 'urllist' to https://cpan.org/. Type 'o conf init urllist' to change it.

Autoconfiguration complete.

commit: wrote '/root/.local/share/.cpan/CPAN/MyConfig.pm'

You can re-run configuration any time with 'o conf init' in the CPAN shell
Terminal does not support AddHistory.

To fix that, maybe try>  install Term::ReadLine::Perl


cpan shell -- CPAN exploration and modules installation (v2.29)
Enter 'h' for help.

cpan[1]> install Config::Tiny
cpan[2]> install Log::Dispatch
cpan[3]> install Mail::Sender
Specify defaults for Mail::Sender? (y/N) y
Default encoding of message bodies (N)one, (Q)uoted-printable, (B)ase64: n

cpan[4]> install Parallel::ForkManager
cpan[5]>exit

#验证组件是否安装成功
[root@mha MHA-7]# perl -MConfig::Tiny -e 'print "OK\n"'
OK
[root@mha MHA-7]# perl -MLog::Dispatch -e 'print "OK\n"'
OK
[root@mha MHA-7]# perl -MMail::Sender -e 'print "OK\n"'
Mail::Sender is deprecated and you should look to Email::Sender instead at -e line 0.
OK
[root@mha MHA-7]# perl -MParallel::ForkManager -e 'print "OK\n"'
OK

#在mha节点
[root@mha MHA-7]# rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm mha4mysql-node-0.58-0.el7.centos.noarch.rpm --nodeps
Verifying...                          ################################# [100%]
准备中...                          ################################# [100%]
正在升级/安装...
   1:mha4mysql-node-0.58-0.el7.centos ################################# [ 50%]
   2:mha4mysql-manager-0.58-0.el7.cent################################# [100%]

#在所有mysql节点
[root@mha MHA-7]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm --nodeps
Verifying...                          ################################# [100%]
准备中...                          ################################# [100%]
        软件包 mha4mysql-node-0.58-0.el7.centos.noarch 已经安装

在slave中安装相应软件

复制代码
[root@mha MHA-7]# for i in 10 20 30
> do
> scp mha4mysql-node-0.58-0.el7.centos.noarch.rpm root@172.25.254.$i:/mnt
> ssh -l root 172.25.254.$i "rpm -ivh /mnt/mha4mysql-node-0.58-0.el7.centos.noarch.rpm --nodeps"
> done
The authenticity of host '172.25.254.10 (172.25.254.10)' can't be established.
ED25519 key fingerprint is SHA256:ah09hEjruT9AKOlj3uEzM5XsHIjXwHf1HrNTY0MNW1o.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '172.25.254.10' (ED25519) to the list of known hosts.
root@172.25.254.10's password: 
mha4mysql-node-0.58-0.el7.centos.noarch.rpm         100%   35KB  21.6MB/s   00:00    
root@172.25.254.10's password: 
Verifying...                          ########################################
准备中...                          ########################################
        软件包 mha4mysql-node-0.58-0.el7.centos.noarch 已经安装
The authenticity of host '172.25.254.20 (172.25.254.20)' can't be established.
ED25519 key fingerprint is SHA256:ah09hEjruT9AKOlj3uEzM5XsHIjXwHf1HrNTY0MNW1o.
This host key is known by the following other names/addresses:
    ~/.ssh/known_hosts:1: 172.25.254.10
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '172.25.254.20' (ED25519) to the list of known hosts.
root@172.25.254.20's password: 
mha4mysql-node-0.58-0.el7.centos.noarch.rpm         100%   35KB  22.2MB/s   00:00    
root@172.25.254.20's password: 
Verifying...                          ########################################
准备中...                          ########################################
        软件包 mha4mysql-node-0.58-0.el7.centos.noarch 已经安装
The authenticity of host '172.25.254.30 (172.25.254.30)' can't be established.
ED25519 key fingerprint is SHA256:ah09hEjruT9AKOlj3uEzM5XsHIjXwHf1HrNTY0MNW1o.
This host key is known by the following other names/addresses:
    ~/.ssh/known_hosts:1: 172.25.254.10
    ~/.ssh/known_hosts:4: 172.25.254.20
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '172.25.254.30' (ED25519) to the list of known hosts.
root@172.25.254.30's password: 
mha4mysql-node-0.58-0.el7.centos.noarch.rpm         100%   35KB  18.6MB/s   00:00    
root@172.25.254.30's password: 
Verifying...                          ########################################
准备中...                          ########################################
        软件包 mha4mysql-node-0.58-0.el7.centos.noarch 已经安装

修改MHA-Manager中的检测代码

复制代码
[root@mha MHA-7]# vim /usr/share/perl5/vendor_perl/MHA/NodeUtil.pm
199 #sub parse_mysql_major_version($) {
200 #  my $str = shift;
201 #  my $result = sprintf( '%03d%03d', $str =~ m/(\d+)/g );
202 #  return $result;
203 #}

sub parse_mysql_major_version($) {
  my $str = shift;
  my @nums = $str =~ m/(\d+)/g;
  my $result = sprintf( '%03d%03d', $nums[0]//0, $nums[1]//0);
  return $result;
}

为MHA建立远程登录用户

复制代码
#在master主机中
[root@mysql-node1 ~]# mysql -uroot -plee
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 29
Server version: 8.3.0 Source distribution

Copyright (c) 2000, 2025, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create user root@'%' identified with mysql_native_password by 'lee';
Query OK, 0 rows affected (0.17 sec)

mysql> GRANT ALL ON *.* TO root@'%' ;
Query OK, 0 rows affected (0.01 sec)

生成MHA-manager的配置文件模板

复制代码
[root@mha ~]# wget https://github.com/yoshinorim/mha4mysql-manager/archive/refs/tags/v0.58.tar.gz -O mha4mysql-manager-0.58.tar.gz
[root@mha MHA-7]# tar zxf mha4mysql-manager-0.58.tar.gz
[root@mha MHA-7]# cd mha4mysql-manager-0.58
[root@mha mha4mysql-manager-0.58]# mkdir  /etc/masterha/ -p
[root@mha mha4mysql-manager-0.58]# cat samples/conf/masterha_default.cnf samples/conf/app1.cnf  > /etc/masterha/app1.cnf

修改配置文件

复制代码
[root@mha mha4mysql-manager-0.58]# vim /etc/masterha/app1.cnf
[server default]
user=root
password=lee
ssh_user=root
repl_user=lee
repl_password=lee
master_binlog_dir= /data/mysql
remote_workdir=/tmp
secondary_check_script= masterha_secondary_check -s 172.25.254.10 -s 172.25.254.2
ping_interval=3
# master_ip_failover_script= /script/masterha/master_ip_failover
# shutdown_script= /script/masterha/power_manager
# report_script= /script/masterha/send_report
# master_ip_online_change_script= /script/masterha/master_ip_online_change
[server default]
manager_workdir=/etc/masterha
manager_log=/etc/masterha/mha.log

[server1]
hostname=172.25.254.10
candidate_master=1
check_repl_delay=0

[server2]
hostname=172.25.254.20
candidate_master=1
check_repl_delay=0

[server3]
hostname=172.25.254.30
no_master=1

检测环境

复制代码
[root@mha mha4mysql-manager-0.58]# masterha_check_ssh  --conf=/etc/masterha/app1.cnf
Sat Mar  7 13:21:57 2026 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar  7 13:21:57 2026 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Mar  7 13:21:57 2026 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sat Mar  7 13:21:57 2026 - [info] Starting SSH connection tests..
Sat Mar  7 13:21:57 2026 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63] 
Sat Mar  7 13:21:57 2026 - [debug]  Connecting via SSH from root@172.25.254.10(172.25.254.10:22) to root@172.25.254.20(172.25.254.20:22)..
root@172.25.254.10: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Sat Mar  7 13:21:57 2026 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@172.25.254.10(172.25.254.10:22) to root@172.25.254.20(172.25.254.20:22) failed!
Sat Mar  7 13:21:58 2026 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63] 
Sat Mar  7 13:21:57 2026 - [debug]  Connecting via SSH from root@172.25.254.20(172.25.254.20:22) to root@172.25.254.10(172.25.254.10:22)..
root@172.25.254.20: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Sat Mar  7 13:21:57 2026 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@172.25.254.20(172.25.254.20:22) to root@172.25.254.10(172.25.254.10:22) failed!
Sat Mar  7 13:21:58 2026 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63] 
Sat Mar  7 13:21:58 2026 - [debug]  Connecting via SSH from root@172.25.254.30(172.25.254.30:22) to root@172.25.254.10(172.25.254.10:22)..
root@172.25.254.30: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Sat Mar  7 13:21:58 2026 - [error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln111] SSH connection from root@172.25.254.30(172.25.254.30:22) to root@172.25.254.10(172.25.254.10:22) failed!
SSH Configuration Check Failed!
 at /usr/bin/masterha_check_ssh line 44.
[root@mha mha4mysql-manager-0.58]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Sat Mar  7 13:22:05 2026 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar  7 13:22:05 2026 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Mar  7 13:22:05 2026 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sat Mar  7 13:22:05 2026 - [info] MHA::MasterMonitor version 0.58.
Sat Mar  7 13:22:14 2026 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln188] There is no alive server. We can't do failover
Sat Mar  7 13:22:14 2026 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Sat Mar  7 13:22:14 2026 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Sat Mar  7 13:22:14 2026 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

集群切换操作

手动切换

master无故障切换

复制代码
#默认状态
[root@mysql-node2 MHA-7]# mysql -uroot -plee -e "show slave status\G;"  | head -n 10
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 172.25.254.10
                  Master_User: lee
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1479
               Relay_Log_File: mysql-node2-relay-bin.000002
                Relay_Log_Pos: 422
[root@mysql-node3 MHA-7]# mysql -uroot -plee -e "show slave status\G;"  | head -n 10
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 172.25.254.10
                  Master_User: lee
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1479
               Relay_Log_File: mysql-node2-relay-bin.000002
                Relay_Log_Pos: 422

#执行切换,把master切换到20
[root@mha ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=172.25.254.20 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Sat Mar  7 13:35:38 2026 - [info] MHA::MasterRotate version 0.58.
Sat Mar  7 13:35:38 2026 - [info] Starting online master switch..
Sat Mar  7 13:35:38 2026 - [info] 
Sat Mar  7 13:35:38 2026 - [info] * Phase 1: Configuration Check Phase..
Sat Mar  7 13:35:38 2026 - [info] 
Sat Mar  7 13:35:38 2026 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar  7 13:35:38 2026 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Mar  7 13:35:38 2026 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sat Mar  7 13:35:38 2026 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln180] Got MySQL error when connecting 172.25.254.30(172.25.254.30:3306) :1045:Access denied for user 'root'@'172.25.254.40' (using password: YES), but this is not a MySQL crash. Check MySQL server settings.
Sat Mar  7 13:35:38 2026 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301]  at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297.
Sat Mar  7 13:35:38 2026 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln180] Got MySQL error when connecting 172.25.254.20(172.25.254.20:3306) :1045:Access denied for user 'root'@'172.25.254.40' (using password: YES), but this is not a MySQL crash. Check MySQL server settings.
Sat Mar  7 13:35:38 2026 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301]  at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297.
Sat Mar  7 13:35:39 2026 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln309] Got fatal error, stopping operations
Sat Mar  7 13:35:39 2026 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln178] Got ERROR:  at /usr/share/perl5/vendor_perl/MHA/MasterRotate.pm line 86.
[root@mha ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=172.25.254.20 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Sat Mar  7 13:39:05 2026 - [info] MHA::MasterRotate version 0.58.
Sat Mar  7 13:39:05 2026 - [info] Starting online master switch..
Sat Mar  7 13:39:05 2026 - [info] 
Sat Mar  7 13:39:05 2026 - [info] * Phase 1: Configuration Check Phase..
Sat Mar  7 13:39:05 2026 - [info] 
Sat Mar  7 13:39:05 2026 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar  7 13:39:05 2026 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Mar  7 13:39:05 2026 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sat Mar  7 13:39:06 2026 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln939] SQL Thread is stopped(error) on 172.25.254.20(172.25.254.20:3306)! Errno:1396, Error:Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '73b2a16e-19d2-11f1-bbcc-000c295fc08e:2' at source log mysql-bin.000002, end_log_pos 760. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Sat Mar  7 13:39:06 2026 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln939] SQL Thread is stopped(error) on 172.25.254.30(172.25.254.30:3306)! Errno:1396, Error:Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '73b2a16e-19d2-11f1-bbcc-000c295fc08e:2' at source log mysql-bin.000002, end_log_pos 760. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Sat Mar  7 13:39:06 2026 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln193] There is no alive slave. We can't do failover
Sat Mar  7 13:39:06 2026 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln178] Got ERROR:  at /usr/share/perl5/vendor_perl/MHA/MasterRotate.pm line 86.
[root@mha ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=172.25.254.20 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Sat Mar  7 13:48:33 2026 - [info] MHA::MasterRotate version 0.58.
Sat Mar  7 13:48:33 2026 - [info] Starting online master switch..
Sat Mar  7 13:48:33 2026 - [info] 
Sat Mar  7 13:48:33 2026 - [info] * Phase 1: Configuration Check Phase..
Sat Mar  7 13:48:33 2026 - [info] 
Sat Mar  7 13:48:33 2026 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar  7 13:48:33 2026 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Mar  7 13:48:33 2026 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sat Mar  7 13:48:34 2026 - [info] GTID failover mode = 1
Sat Mar  7 13:48:34 2026 - [info] Current Alive Master: 172.25.254.10(172.25.254.10:3306)
Sat Mar  7 13:48:34 2026 - [info] Alive Slaves:
Sat Mar  7 13:48:34 2026 - [info]   172.25.254.20(172.25.254.20:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Sat Mar  7 13:48:34 2026 - [info]     GTID ON
Sat Mar  7 13:48:34 2026 - [info]     Replicating from 172.25.254.10(172.25.254.10:3306)
Sat Mar  7 13:48:34 2026 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Mar  7 13:48:34 2026 - [info]   172.25.254.30(172.25.254.30:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Sat Mar  7 13:48:34 2026 - [info]     GTID ON
Sat Mar  7 13:48:34 2026 - [info]     Replicating from 172.25.254.10(172.25.254.10:3306)
Sat Mar  7 13:48:34 2026 - [info]     Not candidate for the new Master (no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 172.25.254.10(172.25.254.10:3306)? (YES/no): yes
Sat Mar  7 13:48:37 2026 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Sat Mar  7 13:48:37 2026 - [info]  ok.
Sat Mar  7 13:48:37 2026 - [info] Checking MHA is not monitoring or doing failover..
Sat Mar  7 13:48:37 2026 - [info] Checking replication health on 172.25.254.20..
Sat Mar  7 13:48:37 2026 - [info]  ok.
Sat Mar  7 13:48:37 2026 - [info] Checking replication health on 172.25.254.30..
Sat Mar  7 13:48:37 2026 - [info]  ok.
Sat Mar  7 13:48:37 2026 - [info] 172.25.254.20 can be new master.
Sat Mar  7 13:48:37 2026 - [info] 
From:
172.25.254.10(172.25.254.10:3306) (current master)
 +--172.25.254.20(172.25.254.20:3306)
 +--172.25.254.30(172.25.254.30:3306)

To:
172.25.254.20(172.25.254.20:3306) (new master)
 +--172.25.254.30(172.25.254.30:3306)
 +--172.25.254.10(172.25.254.10:3306)

Starting master switch from 172.25.254.10(172.25.254.10:3306) to 172.25.254.20(172.25.254.20:3306)? (yes/NO): yes
Sat Mar  7 13:48:40 2026 - [info] Checking whether 172.25.254.20(172.25.254.20:3306) is ok for the new master..
Sat Mar  7 13:48:40 2026 - [info]  ok.
Sat Mar  7 13:48:40 2026 - [info] 172.25.254.10(172.25.254.10:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Sat Mar  7 13:48:40 2026 - [info] 172.25.254.10(172.25.254.10:3306): Resetting slave pointing to the dummy host.
Sat Mar  7 13:48:40 2026 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Mar  7 13:48:40 2026 - [info] 
Sat Mar  7 13:48:40 2026 - [info] * Phase 2: Rejecting updates Phase..
Sat Mar  7 13:48:40 2026 - [info] 
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Sat Mar  7 13:48:42 2026 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Sat Mar  7 13:48:42 2026 - [info] Executing FLUSH TABLES WITH READ LOCK..
Sat Mar  7 13:48:42 2026 - [info]  ok.
Sat Mar  7 13:48:42 2026 - [info] Orig master binlog:pos is mysql-bin.000002:1479.
Sat Mar  7 13:48:42 2026 - [info]  Waiting to execute all relay logs on 172.25.254.20(172.25.254.20:3306)..
Sat Mar  7 13:48:42 2026 - [info]  master_pos_wait(mysql-bin.000002:1479) completed on 172.25.254.20(172.25.254.20:3306). Executed 0 events.
Sat Mar  7 13:48:42 2026 - [info]   done.
Sat Mar  7 13:48:42 2026 - [info] Getting new master's binlog name and position..
Sat Mar  7 13:48:42 2026 - [info]  mysql-bin.000001:1445
Sat Mar  7 13:48:42 2026 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.25.254.20', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='lee', MASTER_PASSWORD='xxx';
Sat Mar  7 13:48:42 2026 - [info] 
Sat Mar  7 13:48:42 2026 - [info] * Switching slaves in parallel..
Sat Mar  7 13:48:42 2026 - [info] 
Sat Mar  7 13:48:42 2026 - [info] -- Slave switch on host 172.25.254.30(172.25.254.30:3306) started, pid: 52968
Sat Mar  7 13:48:42 2026 - [info] 
Sat Mar  7 13:48:53 2026 - [info] Log messages from 172.25.254.30 ...
Sat Mar  7 13:48:53 2026 - [info] 
Sat Mar  7 13:48:42 2026 - [info]  Waiting to execute all relay logs on 172.25.254.30(172.25.254.30:3306)..
Sat Mar  7 13:48:42 2026 - [info]  master_pos_wait(mysql-bin.000002:1479) completed on 172.25.254.30(172.25.254.30:3306). Executed 0 events.
Sat Mar  7 13:48:42 2026 - [info]   done.
Sat Mar  7 13:48:42 2026 - [info]  Resetting slave 172.25.254.30(172.25.254.30:3306) and starting replication from the new master 172.25.254.20(172.25.254.20:3306)..
Sat Mar  7 13:48:42 2026 - [info]  Executed CHANGE MASTER.
Sat Mar  7 13:48:53 2026 - [info]  Slave started.
Sat Mar  7 13:48:53 2026 - [info] End of log messages from 172.25.254.30 ...
Sat Mar  7 13:48:53 2026 - [info] 
Sat Mar  7 13:48:53 2026 - [info] -- Slave switch on host 172.25.254.30(172.25.254.30:3306) succeeded.
Sat Mar  7 13:48:53 2026 - [info] Unlocking all tables on the orig master:
Sat Mar  7 13:48:53 2026 - [info] Executing UNLOCK TABLES..
Sat Mar  7 13:48:53 2026 - [info]  ok.
Sat Mar  7 13:48:53 2026 - [info] Starting orig master as a new slave..
Sat Mar  7 13:48:53 2026 - [info]  Resetting slave 172.25.254.10(172.25.254.10:3306) and starting replication from the new master 172.25.254.20(172.25.254.20:3306)..
Sat Mar  7 13:48:53 2026 - [info]  Executed CHANGE MASTER.
Sat Mar  7 13:49:04 2026 - [info]  Slave started.
Sat Mar  7 13:49:04 2026 - [info] All new slave servers switched successfully.
Sat Mar  7 13:49:04 2026 - [info] 
Sat Mar  7 13:49:04 2026 - [info] * Phase 5: New master cleanup phase..
Sat Mar  7 13:49:04 2026 - [info] 
Sat Mar  7 13:49:04 2026 - [info]  172.25.254.20: Resetting slave info succeeded.
Sat Mar  7 13:49:04 2026 - [info] Switching master to 172.25.254.20(172.25.254.20:3306) completed successfully.

#查看集群状态
[root@mysql-node1 ~]# mysql -uroot -plee -e "show slave status\G;"  | head -n 15
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 172.25.254.20
                  Master_User: lee
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 1445
               Relay_Log_File: mysql-node1-relay-bin.000002
                Relay_Log_Pos: 422
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
[root@mysql-node3 ~]# mysql -uroot -plee -e "show slave status\G;"  | head -n 15
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 172.25.254.20
                  Master_User: lee
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 1445
               Relay_Log_File: mysql-node3-relay-bin.000002
                Relay_Log_Pos: 422
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 

master故障后切换

复制代码
#关闭主库node2
[root@mysql-node2 ~]# systemctl stop mysqld
[root@mysql-node2 ~]# ps -ef | grep mysqld
root       53217    4656  0 13:59 pts/1    00:00:00 grep --color=auto mysqld
[root@mysql-node2 ~]# netstat -tlnp | grep 3306

#切换主库到node1
[root@mha ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=172.25.254.20 --dead_master_port=3306 --new_master_host=172.25.254
.10 --new_master_port=3306 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 172.25.254.20.
Sat Mar  7 14:01:14 2026 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar  7 14:01:14 2026 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sat Mar  7 14:01:14 2026 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sat Mar  7 14:01:14 2026 - [info] MHA::MasterFailover version 0.58.
Sat Mar  7 14:01:14 2026 - [info] Starting master failover.
Sat Mar  7 14:01:14 2026 - [info] 
Sat Mar  7 14:01:14 2026 - [info] * Phase 1: Configuration Check Phase..
Sat Mar  7 14:01:14 2026 - [info] 
Sat Mar  7 14:01:15 2026 - [info] GTID failover mode = 1
Sat Mar  7 14:01:15 2026 - [info] Dead Servers:
Sat Mar  7 14:01:15 2026 - [info]   172.25.254.20(172.25.254.20:3306)
Sat Mar  7 14:01:15 2026 - [info] Checking master reachability via MySQL(double check)...
Sat Mar  7 14:01:15 2026 - [info]  ok.
Sat Mar  7 14:01:15 2026 - [info] Alive Servers:
Sat Mar  7 14:01:15 2026 - [info]   172.25.254.10(172.25.254.10:3306)
Sat Mar  7 14:01:15 2026 - [info]   172.25.254.30(172.25.254.30:3306)
Sat Mar  7 14:01:15 2026 - [info] Alive Slaves:
Sat Mar  7 14:01:15 2026 - [info]   172.25.254.10(172.25.254.10:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Sat Mar  7 14:01:15 2026 - [info]     GTID ON
Sat Mar  7 14:01:15 2026 - [info]     Replicating from 172.25.254.20(172.25.254.20:3306)
Sat Mar  7 14:01:15 2026 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Mar  7 14:01:15 2026 - [info]   172.25.254.30(172.25.254.30:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Sat Mar  7 14:01:15 2026 - [info]     GTID ON
Sat Mar  7 14:01:15 2026 - [info]     Replicating from 172.25.254.20(172.25.254.20:3306)
Sat Mar  7 14:01:15 2026 - [info]     Not candidate for the new Master (no_master is set)
Master 172.25.254.20(172.25.254.20:3306) is dead. Proceed? (yes/NO): yes
Sat Mar  7 14:01:18 2026 - [info] Starting GTID based failover.
Sat Mar  7 14:01:18 2026 - [info] 
Sat Mar  7 14:01:18 2026 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Mar  7 14:01:18 2026 - [info] 
Sat Mar  7 14:01:18 2026 - [info] * Phase 2: Dead Master Shutdown Phase..
Sat Mar  7 14:01:18 2026 - [info] 
root@172.25.254.20: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Sat Mar  7 14:01:18 2026 - [warning] HealthCheck: SSH to 172.25.254.20 is NOT reachable.
Sat Mar  7 14:01:18 2026 - [info] Forcing shutdown so that applications never connect to the current master..
Sat Mar  7 14:01:18 2026 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Sat Mar  7 14:01:18 2026 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sat Mar  7 14:01:18 2026 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sat Mar  7 14:01:18 2026 - [info] 
Sat Mar  7 14:01:18 2026 - [info] * Phase 3: Master Recovery Phase..
Sat Mar  7 14:01:18 2026 - [info] 
Sat Mar  7 14:01:18 2026 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sat Mar  7 14:01:18 2026 - [info] 
Sat Mar  7 14:01:18 2026 - [info] The latest binary log file/position on all slaves is mysql-bin.000001:1445
Sat Mar  7 14:01:18 2026 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sat Mar  7 14:01:18 2026 - [info]   172.25.254.10(172.25.254.10:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Sat Mar  7 14:01:18 2026 - [info]     GTID ON
Sat Mar  7 14:01:18 2026 - [info]     Replicating from 172.25.254.20(172.25.254.20:3306)
Sat Mar  7 14:01:18 2026 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Mar  7 14:01:18 2026 - [info]   172.25.254.30(172.25.254.30:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Sat Mar  7 14:01:18 2026 - [info]     GTID ON
Sat Mar  7 14:01:18 2026 - [info]     Replicating from 172.25.254.20(172.25.254.20:3306)
Sat Mar  7 14:01:18 2026 - [info]     Not candidate for the new Master (no_master is set)
Sat Mar  7 14:01:18 2026 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:1445
Sat Mar  7 14:01:18 2026 - [info] Oldest slaves:
Sat Mar  7 14:01:18 2026 - [info]   172.25.254.10(172.25.254.10:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Sat Mar  7 14:01:18 2026 - [info]     GTID ON
Sat Mar  7 14:01:18 2026 - [info]     Replicating from 172.25.254.20(172.25.254.20:3306)
Sat Mar  7 14:01:18 2026 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Mar  7 14:01:18 2026 - [info]   172.25.254.30(172.25.254.30:3306)  Version=8.3.0 (oldest major version between slaves) log-bin:enabled
Sat Mar  7 14:01:18 2026 - [info]     GTID ON
Sat Mar  7 14:01:18 2026 - [info]     Replicating from 172.25.254.20(172.25.254.20:3306)
Sat Mar  7 14:01:18 2026 - [info]     Not candidate for the new Master (no_master is set)
Sat Mar  7 14:01:18 2026 - [info] 
Sat Mar  7 14:01:18 2026 - [info] * Phase 3.3: Determining New Master Phase..
Sat Mar  7 14:01:18 2026 - [info] 
Sat Mar  7 14:01:18 2026 - [info] 172.25.254.10 can be new master.
Sat Mar  7 14:01:18 2026 - [info] New master is 172.25.254.10(172.25.254.10:3306)
Sat Mar  7 14:01:18 2026 - [info] Starting master failover..
Sat Mar  7 14:01:18 2026 - [info] 
From:
172.25.254.20(172.25.254.20:3306) (current master)
 +--172.25.254.10(172.25.254.10:3306)
 +--172.25.254.30(172.25.254.30:3306)

To:
172.25.254.10(172.25.254.10:3306) (new master)
 +--172.25.254.30(172.25.254.30:3306)

Starting master switch from 172.25.254.20(172.25.254.20:3306) to 172.25.254.10(172.25.254.10:3306)? (yes/NO): yes
Sat Mar  7 14:01:20 2026 - [info] New master decided manually is 172.25.254.10(172.25.254.10:3306)
Sat Mar  7 14:01:20 2026 - [info] 
Sat Mar  7 14:01:20 2026 - [info] * Phase 3.3: New Master Recovery Phase..
Sat Mar  7 14:01:20 2026 - [info] 
Sat Mar  7 14:01:20 2026 - [info]  Waiting all logs to be applied.. 
Sat Mar  7 14:01:20 2026 - [info]   done.
Sat Mar  7 14:01:20 2026 - [info] Getting new master's binlog name and position..
Sat Mar  7 14:01:20 2026 - [info]  mysql-bin.000002:1479
Sat Mar  7 14:01:20 2026 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.25.254.10', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='lee', MASTER_PASSWORD='xxx';
Sat Mar  7 14:01:20 2026 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000002, 1479, 73b2a16e-19d2-11f1-bbcc-000c295fc08e:1-5
Sat Mar  7 14:01:20 2026 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Sat Mar  7 14:01:20 2026 - [info] Setting read_only=0 on 172.25.254.10(172.25.254.10:3306)..
Sat Mar  7 14:01:20 2026 - [info]  ok.
Sat Mar  7 14:01:20 2026 - [info] ** Finished master recovery successfully.
Sat Mar  7 14:01:20 2026 - [info] * Phase 3: Master Recovery Phase completed.
Sat Mar  7 14:01:20 2026 - [info] 
Sat Mar  7 14:01:20 2026 - [info] * Phase 4: Slaves Recovery Phase..
Sat Mar  7 14:01:20 2026 - [info] 
Sat Mar  7 14:01:20 2026 - [info] 
Sat Mar  7 14:01:20 2026 - [info] * Phase 4.1: Starting Slaves in parallel..
Sat Mar  7 14:01:20 2026 - [info] 
Sat Mar  7 14:01:20 2026 - [info] -- Slave recovery on host 172.25.254.30(172.25.254.30:3306) started, pid: 53076. Check tmp log /etc/masterha/172.25.254.30_3306_20260307140114.log if it takes time..
Sat Mar  7 14:01:21 2026 - [info] 
Sat Mar  7 14:01:21 2026 - [info] Log messages from 172.25.254.30 ...
Sat Mar  7 14:01:21 2026 - [info] 
Sat Mar  7 14:01:20 2026 - [info]  Resetting slave 172.25.254.30(172.25.254.30:3306) and starting replication from the new master 172.25.254.10(172.25.254.10:3306)..
Sat Mar  7 14:01:20 2026 - [info]  Executed CHANGE MASTER.
Sat Mar  7 14:01:20 2026 - [info]  Slave started.
Sat Mar  7 14:01:20 2026 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln974] gtid_wait(73b2a16e-19d2-11f1-bbcc-000c295fc08e:1-5) returned NULL on 172.25.254.30(172.25.254.30:3306). Maybe SQL thread was aborted?
Sat Mar  7 14:01:21 2026 - [info] End of log messages from 172.25.254.30.
Sat Mar  7 14:01:21 2026 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln2045] Master failover to 172.25.254.10(172.25.254.10:3306) done, but recovery on slave partially failed.
Sat Mar  7 14:01:21 2026 - [info] 

----- Failover Report -----

app1: MySQL Master failover 172.25.254.20(172.25.254.20:3306) to 172.25.254.10(172.25.254.10:3306)

Master 172.25.254.20(172.25.254.20:3306) is down!

Check MHA Manager logs at mha for details.

Started manual(interactive) failover.
Selected 172.25.254.10(172.25.254.10:3306) as a new master.
172.25.254.10(172.25.254.10:3306): OK: Applying all logs succeeded.
172.25.254.30(172.25.254.30:3306): ERROR: Failed on waiting gtid exec set on master.
Master failover to 172.25.254.10(172.25.254.10:3306) done, but recovery on slave partially failed.

#查看切换信息
[root@mysql-node3 ~]# mysql -uroot -plee -e "show slave status\G;"  | head -n 15
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 172.25.254.20
                  Master_User: lee
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 1445
               Relay_Log_File: mysql-node3-relay-bin.000002
                Relay_Log_Pos: 422
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
[root@mysql-node3 ~]# mysql -uroot -plee -e "show slave status\G;"  | head -n 15
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 172.25.254.10
                  Master_User: lee
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1479
               Relay_Log_File: mysql-node3-relay-bin.000002
                Relay_Log_Pos: 422
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB:

#故障恢复
#当出现故障切换后,mha主机中会出现切换锁文件,当文件存在后不能再次执行切换
[root@mha ~]# ls /etc/masterha/
app1.cnf  app1.failover.complete
				  |
				#锁文件
[root@mysql-node2 ~]# systemctl start mysqld
[root@mha ~]# rm -fr /etc/masterha/app1.failover.complete
[root@mysql-node2 ~]# mysql -uroot -plee -e "reset slave;"



[root@mysql-node1 ~]# /etc/init.d/mysqld start
Starting MySQL. SUCCESS!

mysql -uroot -plee -e "CHANGE MASTER TO MASTER_HOST='172.25.254.20', MASTER_USER='lee', MASTER_PASSWORD='lee', MASTER_AUTO_POSITION=1;"

[root@mysql-node1 ~]# mysql -uroot -plee -e "start slave;" 
[root@mysql-node1 ~]# mysql -uroot -plee -e "show slave status\G;"  | head -n 15
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 172.25.254.20
                  Master_User: lee
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 2337
               Relay_Log_File: mysql-node1-relay-bin.000002
                Relay_Log_Pos: 422
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:

自动切换

复制代码
#为了方便观察建议开启两个shell
[root@mha ~]# > /etc/masterha/*.log
[root@mha ~]# watch -n 1 cat /etc/masterha/mha.log

#开启自动切换功能
[root@mha ~]# masterha_manager --conf=/etc/masterha/app1.cnf  &
[root@mha ~]# jobs
[1]+  运行中               masterha_manager --conf=/etc/masterha/app1.cnf &

#模拟故障
[root@mysql-node1 ~]# /etc/init.d/mysqld stop

vip功能及vip的启动切换

复制代码
[root@mha ~]# ll MHA-7/master_ip_*
-rw-r--r-- 1 root root 2156  1月 14  2021 MHA-7/master_ip_failover
-rw-r--r-- 1 root root 3813  1月 14  2021 MHA-7/master_ip_online_change
[root@mha ~]# mkdir  /etc/masterha/scripts
[root@mha ~]# cp  MHA-7/master_ip_*  /etc/masterha/scripts
[root@mha ~]# vim /etc/masterha/app1.cnf
master_ip_failover_script= /etc/masterha/scripts/master_ip_failover

master_ip_online_change_script= /etc/masterha/scripts/master_ip_online_change
[root@mha ~]# vim /etc/masterha/scripts/master_ip_failover
my $vip = '172.25.254.100/24';
[root@mha ~]# vim /etc/masterha/scripts/master_ip_online_change
my $vip = '172.25.254.100/24';
[root@mysql-node1 ~]# ip a a 172.25.254.100/24 dev eth0
[root@mysql-node1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:5f:c0:8e brd ff:ff:ff:ff:ff:ff
    altname enp3s0
    altname ens160
    inet 172.25.254.10/24 brd 172.25.254.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 172.25.254.100/24 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe5f:c08e/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

#测试
[root@mha ~]# masterha_manager  --conf=/etc/masterha/app1.cnf &
[root@mha ~]# jobs
[1]+  运行中               masterha_manager --conf=/etc/masterha/app1.cnf &

#关闭mysql master
[root@mysql-node1 ~]# /etc/init.d/mysqld stop
Shutting down MySQL........... SUCCESS! 
[root@mysql-node2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:8c:33:8f brd ff:ff:ff:ff:ff:ff
    altname enp3s0
    altname ens160
    inet 172.25.254.20/24 brd 172.25.254.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 172.25.254.100/24 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe8c:338f/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
相关推荐
ILL11IIL2 小时前
Mysql 集群技术
数据库·mysql·mha
2301_767902642 小时前
mysql备份
数据库·mysql·adb
NineData2 小时前
NineData 社区版 V4.10.0 正式发布
数据库·mysql·代码规范
heze092 小时前
sqli-labs-Less-46
数据库·mysql·网络安全
Y001112362 小时前
Day1-MySQL概述+SQL-1
数据库·mysql
小二·2 小时前
Go 语言系统编程与云原生开发实战(第33篇)
开发语言·云原生·golang
vx+_bysj68692 小时前
【免费领源码】基于springboot欣欣汽车租赁系统 计算机毕业设计项目推荐上万套实战教程JAVA,node.js,C++、python、大屏数据可视化
java·spring boot·mysql·汽车
重庆小透明2 小时前
微服务,不仅仅是“小服务”
java·后端·spring cloud·微服务·云原生·架构
拾贰_C3 小时前
【mysql |centos |mysql设置】MySQL修改密码并创建用户,系统安全默认开启了密码复杂度检查组件,怎么解决
mysql·centos·系统安全