本文全程适配银河麒麟 Kylin V10、欧拉 openEuler 等国产 Linux 系统(x86_64/arm64 / 鲲鹏 / 飞腾),以 MySQL8.0 为核心,通过「MySQL 半同步主从复制 + Xenon Raft 共识 + VIP 漂移」搭建高可用集群,重点补充生产环境必备的监控告警(Prometheus+Grafana+Alertmanager)和定时备份(全量 + 增量)实操步骤,包含环境准备、集群搭建、高可用验证、监控备份配置、故障排查全流程,所有命令可直接落地执行。
一、核心架构与原理
1.1 整体架构(3 节点标准部署)
[MySQL节点1(192.168.1.100)] + [Xenon实例1] + [监控/备份客户端]
|(主从复制/Raft通信/监控数据采集)
[MySQL节点2(192.168.1.101)] + [Xenon实例2] + [监控/备份客户端]
|(主从复制/Raft通信/监控数据采集)
[MySQL节点3(192.168.1.102)] + [Xenon实例3] + [监控/备份客户端]
|
[VIP(192.168.1.200)] → 绑定当前主节点,业务统一访问
[Prometheus+Grafana+Alertmanager] → 集群监控+告警
[备份存储] → 本地/远程存储MySQL全量+增量备份文件
1.2 核心组件分工
- MySQL:开启 GTID + 半同步主从复制,保证数据一致性,主节点可写、从节点只读;
- Xenon :每个 MySQL 节点本地部署,实现节点监控、Raft 选主、自动故障转移、VIP 漂移,秒级完成主从切换,业务无感知;
- VIP:虚拟 IP 作为业务唯一访问入口,屏蔽底层节点 IP 变化;
- Prometheus+Grafana:采集 Xenon/MySQL/ 服务器核心指标,可视化展示,支持自定义仪表盘;
- Alertmanager:对接 Prometheus,实现邮件 / 企业微信 / 钉钉告警,覆盖节点宕机、复制延迟、备份失败等核心故障;
- 备份工具 :采用
percona-xtrabackup(全量备份)+ MySQL binlog(增量备份),结合 crontab 实现定时备份,支持备份验证与自动清理。
1.3 核心工作流程
- 集群初始化后,Xenon 通过 Raft 选举出 MySQL 主节点,VIP 绑定主节点,业务通过 VIP 读写;
- 从节点通过半同步复制同步主节点数据,Xenon 实时监控所有节点状态(心跳、MySQL 进程、复制延迟);
- 主节点宕机后,Xenon 自动重新选主,完成 VIP 漂移、主从关系更新,全程 3-5 秒,业务无感知;
- Prometheus 定时采集 Xenon/MySQL/ 服务器指标,Grafana 可视化展示,指标触发阈值时 Alertmanager 发送告警;
- 定时执行全量备份(每日)+ 增量备份(每小时),备份文件存储到专用目录,自动清理过期备份,定期验证备份可用性。
二、前置准备(3 节点统一执行)
2.1 服务器配置要求
- 系统:银河麒麟 Kylin V10(SP1/SP2)、欧拉 openEuler 20.03/22.03;
- 架构:x86_64/arm64 / 鲲鹏 / 飞腾;
- 配置:测试环境≥2G 内存 / 20G 磁盘,生产环境≥4G 内存 / 100G SSD 磁盘;
- 网络:3 节点同一局域网,互通所有端口(3306/MySQL、8080/Xenon Raft、9090/Prometheus、3000/Grafana、9093/Alertmanager)。
2.2 基础环境统一配置
步骤 1:关闭防火墙 + SELinux(测试 / 生产均建议精准开放端口,此处测试环境直接关闭)
# 银河麒麟/欧拉/CentOS
sudo systemctl stop firewalld && sudo systemctl disable firewalld
# Ubuntu
sudo ufw disable
# 关闭SELinux(永久生效)
sudo sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
sudo setenforce 0
步骤 2:配置主机名 + hosts 解析(避免 IP 漂移通信问题)
# 节点1(192.168.1.100)
sudo hostnamectl set-hostname mysql-xenon-01
# 节点2(192.168.1.101)
sudo hostnamectl set-hostname mysql-xenon-02
# 节点3(192.168.1.102)
sudo hostnamectl set-hostname mysql-xenon-03
# 所有节点添加hosts(追加到文件末尾)
sudo vi /etc/hosts
192.168.1.100 mysql-xenon-01
192.168.1.101 mysql-xenon-02
192.168.1.102 mysql-xenon-03
192.168.1.200 mysql-vip # 规划的VIP,未被占用即可
192.168.1.201 prom-grafana # 监控节点IP(可复用任意MySQL节点)
# 验证
ping -c 2 mysql-xenon-01 && ping -c 2 prom-grafana
步骤 3:安装基础依赖 + 配置系统参数
# 银河麒麟(apt)
sudo apt update && sudo apt install -y wget curl vim net-tools ipvsadm psmisc lrzsz java-1.8.0-openjdk
# 欧拉/CentOS(dnf)
sudo dnf install -y wget curl vim net-tools ipvsadm psmisc lrzsz java-1.8.0-openjdk-devel
# 配置系统限制(解决MySQL/Xenon文件/连接限制)
sudo vi /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535
* soft nproc 65535
* hard nproc 65535
mysql soft nofile 65535
mysql hard nofile 65535
# 配置内核参数(开启IP转发,优化性能)
sudo vi /etc/sysctl.conf
net.ipv4.ip_forward = 1
net.core.somaxconn = 65535
vm.swappiness = 0
fs.file-max = 655350
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_syncookies = 1
# 生效配置
sudo sysctl -p
source /etc/security/limits.conf
三、MySQL8.0 3 节点安装 + GTID + 半同步主从复制搭建
3.1 所有节点统一安装 MySQL8.0
步骤 1:添加 MySQL 官方源(适配 x86_64/arm64)
# 进入源目录
cd /etc/yum.repos.d/
# x86_64架构
sudo wget https://dev.mysql.com/get/mysql80-community-release-el8-3.noarch.rpm
# arm64/鲲鹏架构
# sudo wget https://dev.mysql.com/get/mysql80-community-release-el8-aarch64-3.noarch.rpm
# 安装源包
sudo rpm -ivh mysql80-community-release-el8-3.noarch.rpm
# 银河麒麟(Debian)直接安装
# sudo apt update && sudo apt install -y mysql-server-8.0
步骤 2:安装并初始化 MySQL
# 欧拉/CentOS/麒麟(RPM系)
sudo dnf install -y mysql-community-server --nogpgcheck
# 启动并设置开机自启
sudo systemctl start mysqld && sudo systemctl enable mysqld
# 查看初始临时密码
sudo grep 'temporary password' /var/log/mysqld.log
# 登录重置密码(替换临时密码和新密码,需符合复杂度:数字+字母+特殊符号)
mysql -uroot -p
ALTER USER 'root'@'localhost' IDENTIFIED BY 'MySQL@8080';
# 开启root远程访问(生产建议创建专用账号)
CREATE USER 'root'@'%' IDENTIFIED BY 'MySQL@8080';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;
EXIT;
# 验证远程访问
mysql -uroot -h192.168.1.100 -pMySQL@8080
3.2 所有节点配置 MySQL 核心参数(GTID + 半同步复制)
# 编辑配置文件,覆盖原有内容
sudo vi /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
lower_case_table_names=1
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
# GTID+主从复制核心(server-id每个节点唯一:100/101/102)
server-id=100 # 节点1=100,节点2=101,节点3=102
gtid_mode=ON
enforce_gtid_consistency=ON
log_bin=mysql-bin
binlog_format=ROW
log_slave_updates=ON
binlog_expire_logs_seconds=259200 # binlog保留3天,供增量备份
expire_logs_days=3
# 半同步复制(核心,避免数据丢失)
rpl_semi_sync_master_enabled=1
rpl_semi_sync_slave_enabled=1
rpl_semi_sync_master_timeout=1000 # 1秒超时后降级为异步
# 性能优化
innodb_buffer_pool_size=1G # 生产环境设为内存的50%-70%
innodb_flush_log_at_trx_commit=1
sync_binlog=1
max_connections=1000
slow_query_log=ON # 开启慢查询日志,供监控采集
slow_query_log_file=/var/log/mysql-slow.log
long_query_time=2 # 2秒视为慢查询
[client]
socket=/var/lib/mysql/mysql.sock
default-character-set=utf8mb4
# 重启MySQL生效
sudo systemctl restart mysqld
# 验证配置
mysql -uroot -pMySQL@8080 -e "show variables like '%gtid%'; show variables like '%semi_sync%'"
3.3 搭建主从复制(以 192.168.1.100 为初始主节点)
步骤 1:主节点(100)创建复制专用账号
mysql -uroot -pMySQL@8080
CREATE USER 'repl'@'%' IDENTIFIED BY 'Repl@8080';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
FLUSH PRIVILEGES;
# 查看主节点状态(无需记录,GTID自动定位)
SHOW MASTER STATUS;
EXIT;
步骤 2:从节点(101/102)配置主从复制
mysql -uroot -pMySQL@8080
# 清理原有配置
STOP SLAVE;
RESET SLAVE ALL;
# 配置GTID模式主从复制
CHANGE MASTER TO
MASTER_HOST='192.168.1.100',
MASTER_USER='repl',
MASTER_PASSWORD='Repl@8080',
MASTER_PORT=3306,
MASTER_AUTO_POSITION=1;
# 启动复制
START SLAVE;
# 验证(Slave_IO_Running和Slave_SQL_Running均为Yes)
SHOW SLAVE STATUS\G;
EXIT;
步骤 3:验证半同步复制 + 数据一致性
# 主节点验证半同步客户端数
mysql -uroot -pMySQL@8080 -e "show status like '%rpl_semi_sync_master_clients%'"
# 输出为2,说明2个从节点开启半同步
# 主节点创建测试库表,验证同步
mysql -uroot -pMySQL@8080 -e "create database xenon_test; use xenon_test; create table t1(id int primary key, name varchar(20)); insert into t1 values(1, 'test1');"
# 从节点查询数据
mysql -uroot -pMySQL@8080 -e "select * from xenon_test.t1;"
# 输出数据即同步成功
四、Xenon 部署与高可用集群配置(核心)
4.1 所有节点统一安装 Xenon(v1.1.4 稳定版)
# 创建安装目录
sudo mkdir -p /opt/xenon && cd /opt/xenon
# 下载安装包(x86_64/arm64)
# x86_64
sudo wget https://github.com/radondb/xenon/releases/download/v1.1.4/xenon-v1.1.4-linux-amd64.tar.gz
# arm64/鲲鹏
# sudo wget https://github.com/radondb/xenon/releases/download/v1.1.4/xenon-v1.1.4-linux-arm64.tar.gz
# 解压并配置全局命令
sudo tar -zxvf xenon-v1.1.4-linux-amd64.tar.gz
sudo mv xenon-v1.1.4-linux-amd64 xenon-bin
sudo chmod -R 775 /opt/xenon
sudo ln -s /opt/xenon/xenon-bin/xenon /usr/bin/xenon
# 验证安装
xenon -v # 输出xenon version v1.1.4即成功
# 创建配置/日志/数据目录
sudo mkdir -p /etc/xenon /var/log/xenon /var/lib/xenon
sudo chown -R root:root /etc/xenon /var/log/xenon /var/lib/xenon
sudo chmod -R 755 /etc/xenon /var/log/xenon /var/lib/xenon
4.2 配置 Xenon 核心配置文件(/etc/xenon/xenon.json)
节点 1(192.168.1.100)配置(节点 2/3 仅修改 node_id/hostname/mysql.server_id)
sudo vi /etc/xenon/xenon.json
{
"node_id": 1, # 节点1=1,节点2=2,节点3=3
"hostname": "192.168.1.100", # 本机IP
"rpc_port": 8080, # Xenon Raft通信端口,所有节点一致
"data_dir": "/var/lib/xenon",
"log_dir": "/var/log/xenon",
"log_level": "info",
"metrics_port": 9090, # 监控指标端口,对接Prometheus
"log_rotate": { # 日志轮转,避免文件过大
"max_size": "100M",
"max_age": 7,
"max_backups": 10
},
# MySQL配置(与实际匹配)
"mysql": {
"host": "127.0.0.1",
"port": 3306,
"user": "root",
"password": "MySQL@8080",
"basedir": "/usr",
"datadir": "/var/lib/mysql",
"socket": "/var/lib/mysql/mysql.sock",
"server_id": 100 # 与MySQL的server-id一致
},
# 高可用核心配置
"ha": {
"enable": true,
"election_timeout": 3000,
"heartbeat_interval": 1000,
"repl_user": "repl",
"repl_password": "Repl@8080",
"rpl_semi_sync_enabled": true,
"rpl_semi_sync_timeout": 1000,
"master_connect_retry": 10,
"max_delay": 30, # 复制延迟超30秒,排除选主
# VIP漂移配置(核心)
"vip": {
"enable": true,
"address": "192.168.1.200/24", # VIP+子网掩码,匹配本机网段
"interface": "eth0" # 本机网卡(ip addr查看,如ens33/eth0)
},
# 自动故障转移
"failover": {
"enable": true,
"max_retry": 3,
"retry_interval": 5000
}
},
# Raft集群配置(所有节点peer_addresses完全一致)
"raft": {
"peer_addresses": [
"192.168.1.100:8080",
"192.168.1.101:8080",
"192.168.1.102:8080"
],
"quorum": 2, # 3节点=2,5节点=3
"election_tick": 3,
"heartbeat_tick": 1
}
}
节点 2/3 配置修改点
仅修改 3 个字段,其余与节点 1 完全一致:
node_id:节点 2=2,节点 3=3;hostname:节点 2=192.168.1.101,节点 3=192.168.1.102;mysql.server_id:节点 2=101,节点 3=102。
4.3 所有节点启动 Xenon 并设置开机自启
# 创建系统服务,实现systemctl管理
sudo vi /etc/systemd/system/xenon.service
[Unit]
Description=Xenon MySQL High Availability
After=network.target mysqld.service
Requires=mysqld.service
[Service]
Type=simple
ExecStart=/usr/bin/xenon -c /etc/xenon/xenon.json
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -TERM $MAINPID
Restart=always
RestartSec=5
LimitNOFILE=65535
[Install]
WantedBy=multi-user.target
# 生效并启动
sudo systemctl daemon-reload
sudo systemctl start xenon && sudo systemctl enable xenon
# 验证状态
sudo systemctl status xenon # 输出active (running)即成功
4.4 验证 Xenon 高可用集群状态
步骤 1:查看 Xenon 集群节点状态(任意节点执行)
xenon cli cluster -H 127.0.0.1:8080
正常输出 :3 个节点均为online,有唯一LEADER(主节点),其余为FOLLOWER(从节点)。
步骤 2:验证 VIP 绑定与 MySQL 主从一致性
# 查看VIP绑定(主节点执行,显示VIP即成功)
ip addr # 输出eth0网卡下有192.168.1.200/24
# 验证VIP访问MySQL
mysql -uroot -h192.168.1.200 -pMySQL@8080
# 查看Xenon管理的MySQL主节点
xenon cli master -H 127.0.0.1:8080
# 验证主节点可写(从节点为read_only=ON)
mysql -uroot -h192.168.1.200 -pMySQL@8080 -e "show variables like 'read_only';"
五、MySQL 高可用集群核心能力验证
5.1 自动故障转移验证(模拟主节点宕机)
# 原主节点(如100)执行,关闭MySQL+Xenon
sudo systemctl stop mysqld xenon
# 任意从节点执行,查看集群状态(3-5秒后重新选主)
xenon cli cluster -H 127.0.0.1:8080
# 验证VIP漂移(新主节点执行,显示VIP)
ip addr
# 验证新主节点可写
mysql -uroot -h192.168.1.200 -pMySQL@8080 -e "insert into xenon_test.t1 values(2, 'failover_test');"
5.2 故障节点恢复后自动重加入
# 故障节点(100)执行,启动服务
sudo systemctl start mysqld xenon
# 验证节点状态(变为FOLLOWER,online)
xenon cli cluster -H 127.0.0.1:8080
# 验证数据同步
mysql -uroot -pMySQL@8080 -e "select * from xenon_test.t1;"
5.3 手动主从切换(运维维护用)
# 任意节点执行,切换主节点到192.168.1.102
xenon cli switch-master -H 127.0.0.1:8080 -t 192.168.1.102:3306
# 验证切换结果
xenon cli master -H 127.0.0.1:8080
ip addr | grep 192.168.1.200 # VIP绑定到新主节点
六、生产环境必备:监控告警配置(Prometheus+Grafana+Alertmanager)
以192.168.1.201 为监控节点(可复用任意 MySQL 节点),适配国产 Linux,实现Xenon/MySQL/ 服务器 全指标监控 +邮件 / 企业微信告警。
6.1 安装 Prometheus(2.50.0 稳定版)
# 创建安装目录
sudo mkdir -p /opt/prometheus /var/lib/prometheus /var/log/prometheus
cd /opt/prometheus
# 下载安装包(x86_64/arm64)
# x86_64
sudo wget https://github.com/prometheus/prometheus/releases/download/v2.50.0/prometheus-2.50.0.linux-amd64.tar.gz
# arm64/鲲鹏
# sudo wget https://github.com/prometheus/prometheus/releases/download/v2.50.0/prometheus-2.50.0.linux-arm64.tar.gz
# 解压并配置
sudo tar -zxvf prometheus-2.50.0.linux-amd64.tar.gz
sudo mv prometheus-2.50.0.linux-amd64/* .
sudo rm -rf prometheus-2.50.0.linux-amd64*
# 创建配置文件(采集Xenon/MySQL/节点自身指标)
sudo vi /opt/prometheus/prometheus.yml
global:
scrape_interval: 15s # 采集间隔
evaluation_interval: 15s # 告警规则评估间隔
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.1.201:9093 # Alertmanager地址
rule_files:
- "alert_rules.yml" # 告警规则文件
scrape_configs:
# 采集Prometheus自身指标
- job_name: 'prometheus'
static_configs:
- targets: ['192.168.1.201:9090']
# 采集3个Xenon节点指标(Xenon自带Prometheus指标)
- job_name: 'xenon'
static_configs:
- targets: ['192.168.1.100:9090', '192.168.1.101:9090', '192.168.1.102:9090']
# 采集3个MySQL节点指标(使用mysqld_exporter,后续安装)
- job_name: 'mysql'
static_configs:
- targets: ['192.168.1.100:9104', '192.168.1.101:9104', '192.168.1.102:9104']
# 采集3个服务器节点指标(使用node_exporter,后续安装)
- job_name: 'node'
static_configs:
- targets: ['192.168.1.100:9100', '192.168.1.101:9100', '192.168.1.102:9100']
# 创建系统服务
sudo vi /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
After=network.target
[Service]
Type=simple
User=root
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus --log.level=info --log.file=/var/log/prometheus/prometheus.log
Restart=always
[Install]
WantedBy=multi-user.target
# 启动并设置开机自启
sudo systemctl daemon-reload
sudo systemctl start prometheus && sudo systemctl enable prometheus
# 验证(访问9090端口,或查看状态)
sudo systemctl status prometheus
curl http://192.168.1.201:9090/metrics
6.2 所有 MySQL 节点安装 node_exporter+mysqld_exporter
步骤 1:安装 node_exporter(采集服务器指标:CPU / 内存 / 磁盘 / 网络)
cd /opt
# 下载(x86_64/arm64)
# x86_64
sudo wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
# arm64
# sudo wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-arm64.tar.gz
sudo tar -zxvf node_exporter-1.8.2.linux-amd64.tar.gz
sudo mv node_exporter-1.8.2.linux-amd64/node_exporter /usr/bin/
sudo rm -rf node_exporter-1.8.2.linux-amd64*
# 创建系统服务
sudo vi /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/node_exporter
Restart=always
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl start node_exporter && sudo systemctl enable node_exporter
步骤 2:安装 mysqld_exporter(采集 MySQL 指标:主从 / 连接 / 慢查询)
cd /opt
# 下载(x86_64/arm64)
# x86_64
sudo wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.1/mysqld_exporter-0.15.1.linux-amd64.tar.gz
# arm64
# sudo wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.1/mysqld_exporter-0.15.1.linux-arm64.tar.gz
sudo tar -zxvf mysqld_exporter-0.15.1.linux-amd64.tar.gz
sudo mv mysqld_exporter-0.15.1.linux-amd64/mysqld_exporter /usr/bin/
sudo rm -rf mysqld_exporter-0.15.1.linux-amd64*
# 创建MySQL连接配置文件
sudo vi /etc/mysqld_exporter.cnf
[client]
user=root
password=MySQL@8080
host=127.0.0.1
port=3306
# 创建系统服务
sudo vi /etc/systemd/system/mysqld_exporter.service
[Unit]
Description=MySQL Exporter
After=network.target mysqld.service
[Service]
Type=simple
ExecStart=/usr/bin/mysqld_exporter --config.my-cnf=/etc/mysqld_exporter.cnf --web.listen-address=:9104
Restart=always
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl start mysqld_exporter && sudo systemctl enable mysqld_exporter
# 验证所有exporter状态
sudo systemctl status node_exporter mysqld_exporter
6.3 监控节点安装 Grafana(可视化展示,适配国产 Linux)
# 添加Grafana源
sudo vi /etc/yum.repos.d/grafana.repo
[grafana]
name=Grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
# 安装Grafana
sudo dnf install -y grafana --nogpgcheck
# 银河麒麟(Debian)
# sudo apt install -y apt-transport-https
# sudo wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
# echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
# sudo apt update && sudo apt install -y grafana
# 启动并设置开机自启
sudo systemctl start grafana-server && sudo systemctl enable grafana-server
# 验证
sudo systemctl status grafana-server
Grafana 可视化配置(浏览器访问:http://192.168.1.201:3000,默认账号密码 admin/admin)
- 登录后修改默认密码;
- 点击「Configuration」→「Data Sources」→「Add data source」→选择「Prometheus」;
- 填写 Prometheus 地址:
http://192.168.1.201:9090,点击「Save & test」; - 导入官方仪表盘:
- 服务器监控:导入 ID「1860」(Node Exporter Full);
- MySQL 监控:导入 ID「7362」(MySQL Overview);
- 可自定义 Xenon 监控仪表盘,采集 Xenon 指标(如 xenon_raft_role、xenon_mysql_status)。
6.4 监控节点安装 Alertmanager(告警管理,邮件 / 企业微信)
步骤 1:安装 Alertmanager
cd /opt
# 下载(x86_64/arm64)
# x86_64
sudo wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
# arm64
# sudo wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-arm64.tar.gz
sudo tar -zxvf alertmanager-0.27.0.linux-amd64.tar.gz
sudo mv alertmanager-0.27.0.linux-amd64/{alertmanager,amtool} /usr/bin/
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo mv alertmanager-0.27.0.linux-amd64/config.yml /etc/alertmanager/
sudo rm -rf alertmanager-0.27.0.linux-amd64*
# 配置告警接收器(邮件+企业微信,修改/etc/alertmanager/config.yml)
sudo vi /etc/alertmanager/config.yml
global:
smtp_smarthost: 'smtp.163.com:25' # 邮件SMTP服务器
smtp_from: 'xxx@163.com' # 发件人邮箱
smtp_auth_username: 'xxx@163.com' # 邮箱账号
smtp_auth_password: 'xxx' # 邮箱授权码
smtp_require_tls: false
wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
wechat_api_corp_id: '你的企业微信corp_id'
wechat_api_secret: '你的企业微信应用secret'
wechat_api_agent_id: '你的企业微信应用agent_id'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'wechat+email'
receivers:
- name: 'wechat+email'
wechat_configs:
- send_resolved: true
to_user: '@all' # 发送给所有人
email_configs:
- send_resolved: true
to: 'xxx@qq.com' # 收件人邮箱
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
# 创建系统服务
sudo vi /etc/systemd/system/alertmanager.service
[Unit]
Description=Alertmanager
After=network.target prometheus.service
[Service]
Type=simple
ExecStart=/usr/bin/alertmanager --config.file=/etc/alertmanager/config.yml --storage.path=/var/lib/alertmanager
Restart=always
[Install]
WantedBy=multi-user.target
# 启动并设置开机自启
sudo systemctl daemon-reload
sudo systemctl start alertmanager && sudo systemctl enable alertmanager
# 验证
sudo systemctl status alertmanager
curl http://192.168.1.201:9093/metrics
步骤 2:配置 Prometheus 告警规则(/opt/prometheus/alert_rules.yml)
sudo vi /opt/prometheus/alert_rules.yml
groups:
- name: 服务器告警
rules:
- alert: 服务器宕机
expr: up{job="node"} == 0
for: 10s
labels:
severity: critical
annotations:
summary: "{{ $labels.instance }} 服务器宕机"
description: "{{ $labels.instance }} 服务器已宕机超过10秒"
- alert: CPU使用率过高
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 1m
labels:
severity: warning
annotations:
summary: "{{ $labels.instance }} CPU使用率过高"
description: "{{ $labels.instance }} CPU使用率超过80%,当前值:{{ $value }}%"
- alert: 内存使用率过高
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 1m
labels:
severity: warning
annotations:
summary: "{{ $labels.instance }} 内存使用率过高"
description: "{{ $labels.instance }} 内存使用率超过85%,当前值:{{ $value }}%"
- alert: 磁盘使用率过高
expr: (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100 > 90
for: 1m
labels:
severity: critical
annotations:
summary: "{{ $labels.instance }} 根目录磁盘使用率过高"
description: "{{ $labels.instance }} 根目录磁盘使用率超过90%,当前值:{{ $value }}%"
- name: MySQL告警
rules:
- alert: MySQL宕机
expr: up{job="mysql"} == 0
for: 10s
labels:
severity: critical
annotations:
summary: "{{ $labels.instance }} MySQL宕机"
description: "{{ $labels.instance }} MySQL已宕机超过10秒"
- alert: MySQL主从复制中断
expr: mysql_slave_status_slave_io_running == 0 or mysql_slave_status_slave_sql_running == 0
for: 10s
labels:
severity: critical
annotations:
summary: "{{ $labels.instance }} MySQL主从复制中断"
description: "{{ $labels.instance }} MySQL主从复制IO/SQL线程异常"
- alert: MySQL复制延迟过高
expr: mysql_slave_seconds_behind_master > 30
for: 10s
labels:
severity: warning
annotations:
summary: "{{ $labels.instance }} MySQL复制延迟过高"
description: "{{ $labels.instance }} MySQL复制延迟超过30秒,当前值:{{ $value }}秒"
- alert: MySQL连接数过高
expr: mysql_global_status_connections / mysql_global_variables_max_connections * 100 > 80
for: 1m
labels:
severity: warning
annotations:
summary: "{{ $labels.instance }} MySQL连接数过高"
description: "{{ $labels.instance }} MySQL连接数超过80%,当前值:{{ $value }}%"
- name: Xenon告警
rules:
- alert: Xenon节点宕机
expr: up{job="xenon"} == 0
for: 10s
labels:
severity: critical
annotations:
summary: "{{ $labels.instance }} Xenon节点宕机"
description: "{{ $labels.instance }} Xenon节点已宕机超过10秒"
- alert: Xenon无主节点
expr: count(xenon_raft_role{role="leader"}) == 0
for: 10s
labels:
severity: critical
annotations:
summary: Xenon集群无主节点
description: Xenon集群未选举出主节点,高可用失效
重启 Prometheus 使告警规则生效
sudo systemctl restart prometheus
验证告警 :浏览器访问 Prometheus(http://192.168.1.201:9090)→「Alerts」,可查看所有告警规则,模拟故障(如关闭 MySQL)可触发告警,收到邮件 / 企业微信通知。
七、生产环境必备:定时备份配置(全量 + 增量,所有节点统一配置)
采用percona-xtrabackup 8.0 (支持 MySQL8.0,热备份,不锁表)做每日全量备份 ,结合 MySQL binlog 做每小时增量备份 ,实现备份自动执行、日志记录、过期清理、备份验证 ,备份文件存储到/data/mysql_backup(建议挂载专用磁盘)。
7.1 安装 percona-xtrabackup 8.0
# 添加percona源
sudo vi /etc/yum.repos.d/percona.repo
[percona-release-x86_64]
name=Percona Release Repository - x86_64
baseurl=https://repo.percona.com/yum/release/8.0/redhat/$releasever/$basearch
enabled=1
gpgcheck=1
gpgkey=https://repo.percona.com/yum/PERCONA-PACKAGING-KEY.gpg
# 安装(x86_64/arm64均支持)
sudo dnf install -y percona-xtrabackup-80
# 银河麒麟(Debian)
# sudo apt update && sudo apt install -y percona-xtrabackup-80
# 验证安装
xtrabackup --version # 输出xtrabackup version 8.0.x即成功
7.2 创建备份目录 + 配置环境变量
# 创建备份目录(全量/增量/日志)
sudo mkdir -p /data/mysql_backup/full /data/mysql_backup/incr /data/mysql_backup/log
# 赋予MySQL权限
sudo chown -R mysql:mysql /data/mysql_backup
# 配置环境变量(方便脚本调用)
sudo vi /etc/profile
export MYSQL_USER=root
export MYSQL_PWD=MySQL@8080
export MYSQL_HOST=127.0.0.1
export MYSQL_PORT=3306
export BACKUP_DIR=/data/mysql_backup
export FULL_BACKUP_DIR=$BACKUP_DIR/full
export INCR_BACKUP_DIR=$BACKUP_DIR/incr
export LOG_DIR=$BACKUP_DIR/log
# 生效
source /etc/profile
7.3 编写备份脚本(全量 + 增量 + 验证 + 清理)
脚本 1:全量备份脚本(/data/mysql_backup/mysql_full_backup.sh)
sudo vi /data/mysql_backup/mysql_full_backup.sh
#!/bin/bash
# 全量备份脚本,每日执行
source /etc/profile
# 备份文件名(时间戳)
BACKUP_NAME=mysql_full_$(date +%Y%m%d_%H%M%S)
FULL_BACKUP_PATH=$FULL_BACKUP_DIR/$BACKUP_NAME
LOG_FILE=$LOG_DIR/full_backup_$(date +%Y%m%d).log
# 日志函数
log() {
echo "[$(date +%Y%m%d_%H%M%S)] $1" >> $LOG_FILE
}
log "=====开始全量备份====="
# 执行全量备份
xtrabackup --user=$MYSQL_USER --password=$MYSQL_PWD --host=$MYSQL_HOST --port=$MYSQL_PORT --backup --target-dir=$FULL_BACKUP_PATH --parallel=4 2>>$LOG_FILE
# 检查备份是否成功
if [ $? -eq 0 ]; then
log "全量备份成功:$FULL_BACKUP_PATH"
# 备份完成后准备恢复(生成xtrabackup_info)
xtrabackup --prepare --target-dir=$FULL_BACKUP_PATH 2>>$LOG_FILE
if [ $? -eq 0 ]; then
log "备份准备成功,可用于恢复"
else
log "备份准备失败,需检查"
exit 1
fi
# 清理7天前的全量备份
find $FULL_BACKUP_DIR -type d -mtime +7 -exec rm -rf {} \;
log "清理7天前的全量备份完成"
else
log "全量备份失败,需检查"
exit 1
fi
log "=====全量备份结束=====\n"
# 赋予执行权限
sudo chmod +x /data/mysql_backup/mysql_full_backup.sh
sudo chown mysql:mysql /data/mysql_backup/mysql_full_backup.sh
脚本 2:增量备份脚本(/data/mysql_backup/mysql_incr_backup.sh)
sudo vi /data/mysql_backup/mysql_incr_backup.sh
#!/bin/bash
# 增量备份脚本,每小时执行,基于最新全量备份
source /etc/profile
INCR_BACKUP_NAME=mysql_incr_$(date +%Y%m%d_%H%M%S)
INCR_BACKUP_PATH=$INCR_BACKUP_DIR/$INCR_BACKUP_NAME
LOG_FILE=$LOG_DIR/incr_backup_$(date +%Y%m%d).log
# 获取最新的全量备份目录
LATEST_FULL_BACKUP=$(ls -td $FULL_BACKUP_DIR/* | head -1)
# 日志函数
log() {
echo "[$(date +%Y%m%d_%H%M%S)] $1" >> $LOG_FILE
}
if [ -z $LATEST_FULL_BACKUP ]; then
log "未找到全量备份,无法执行增量备份"
exit 1
fi
log "=====开始增量备份====="
log "基于最新全量备份:$LATEST_FULL_BACKUP"
# 获取上一次增量备份(若无则基于全量)
LATEST_INCR_BACKUP=$(ls -td $INCR_BACKUP_DIR/* 2>/dev/null | head -1)
if [ -z $LATEST_INCR_BACKUP ]; then
# 第一次增量,基于全量
xtrabackup --user=$MYSQL_USER --password=$MYSQL_PWD --host=$MYSQL_HOST --port=$MYSQL_PORT --backup --target-dir=$INCR_BACKUP_PATH --incremental-basedir=$LATEST_FULL_BACKUP --parallel=4 2>>$LOG_FILE
else
# 后续增量,基于上一次增量
xtrabackup --user=$MYSQL_USER --password=$MYSQL_PWD --host=$MYSQL_HOST --port=$MYSQL_PORT --backup --target-dir=$INCR_BACKUP_PATH --incremental-basedir=$LATEST_INCR_BACKUP --parallel=4 2>>$LOG_FILE
fi
# 检查增量备份是否成功
if [ $? -eq 0 ]; then
log "增量备份成功:$INCR_BACKUP_PATH"
# 准备增量备份
xtrabackup --prepare --target-dir=$LATEST_FULL_BACKUP --incremental-dir=$INCR_BACKUP_PATH 2>>$LOG_FILE
if [ $? -eq 0 ]; then
log "增量备份准备成功"
else
log "增量备份准备失败"
exit 1
fi
# 清理3天前的增量备份
find $INCR_BACKUP_DIR -type d -mtime +3 -exec rm -rf {} \;
log "清理3天前的增量备份完成"
else
log "增量备份失败,需检查"
exit 1
fi
log "=====增量备份结束=====\n"
# 赋予执行权限
sudo chmod +x /data/mysql_backup/mysql_incr_backup.sh
sudo chown mysql:mysql /data/mysql_backup/mysql_incr_backup.sh
脚本 3:备份验证脚本(/data/mysql_backup/mysql_backup_check.sh)
sudo vi /data/mysql_backup/mysql_backup_check.sh
#!/bin/bash
# 备份验证脚本,每日全量备份后执行
source /etc/profile
LOG_FILE=$LOG_DIR/backup_check_$(date +%Y%m%d).log
LATEST_FULL_BACKUP=$(ls -td $FULL_BACKUP_DIR/* | head -1)
log() {
echo "[$(date +%Y%m%d_%H%M%S)] $1" >> $LOG_FILE
}
log "=====开始备份验证====="
if [ -z $LATEST_FULL_BACKUP ]; then
log "未找到全量备份,验证失败"
exit 1
fi
# 检查备份目录是否存在关键文件
if [ -f $LATEST_FULL_BACKUP/xtrabackup_info ] && [ -f $LATEST_FULL_BACKUP/backup-my.cnf ]; then
log "全量备份文件验证通过:$LATEST_FULL_BACKUP"
# 检查备份大小(至少100M,可根据实际调整)
BACKUP_SIZE=$(du -sh $LATEST_FULL_BACKUP | awk '{print $1}')
log "全量备份大小:$BACKUP_SIZE"
else
log "全量备份文件缺失,验证失败:$LATEST_FULL_BACKUP"
exit 1
fi
# 检查增量备份(若有)
LATEST_INCR_BACKUP=$(ls -td $INCR_BACKUP_DIR/* 2>/dev/null | head -1)
if [ -n $LATEST_INCR_BACKUP ]; then
if [ -f $LATEST_INCR_BACKUP/xtrabackup_info ]; then
log "增量备份文件验证通过:$LATEST_INCR_BACKUP"
else
log "增量备份文件缺失,验证失败:$LATEST_INCR_BACKUP"
fi
fi
log "=====备份验证结束=====\n"
# 赋予执行权限
sudo chmod +x /data/mysql_backup/mysql_backup_check.sh
sudo chown mysql:mysql /data/mysql_backup/mysql_backup_check.sh
7.4 配置 crontab 定时任务(所有节点执行)
# 编辑mysql用户的定时任务(避免权限问题)
sudo su - mysql
crontab -e
# 添加以下定时任务(按需求调整时间)
# 每日凌晨2点执行全量备份
0 2 * * * /data/mysql_backup/mysql_full_backup.sh
# 全量备份后10分钟执行备份验证
10 2 * * * /data/mysql_backup/mysql_backup_check.sh
# 每小时执行增量备份(从凌晨3点开始,避免与全量冲突)
0 3-23 * * * /data/mysql_backup/mysql_incr_backup.sh
# 保存退出,生效定时任务
crontab -l
# 退出mysql用户
exit
7.5 备份验证与恢复示例
步骤 1:手动执行备份,验证脚本可用性
# 手动执行全量备份
/data/mysql_backup/mysql_full_backup.sh
# 查看日志
cat /data/mysql_backup/log/full_backup_$(date +%Y%m%d).log
# 手动执行增量备份
/data/mysql_backup/mysql_incr_backup.sh
cat /data/mysql_backup/log/incr_backup_$(date +%Y%m%d).log
步骤 2:备份恢复示例(模拟数据丢失,从全量 + 增量恢复)
# 停止MySQL
sudo systemctl stop mysqld
# 清空数据目录
sudo rm -rf /var/lib/mysql/*
# 基于全量+增量恢复(LATEST_FULL为最新全量,LATEST_INCR为最新增量)
LATEST_FULL=$(ls -td /data/mysql_backup/full/* | head -1)
LATEST_INCR=$(ls -td /data/mysql_backup/incr/* | head -1)
# 准备全量备份
xtrabackup --prepare --target-dir=$LATEST_FULL
# 合并增量备份到全量
xtrabackup --prepare --target-dir=$LATEST_FULL --incremental-dir=$LATEST_INCR
# 恢复数据
xtrabackup --copy-back --target-dir=$LATEST_FULL --datadir=/var/lib/mysql
# 赋予MySQL权限
sudo chown -R mysql:mysql /var/lib/mysql
sudo chmod -R 755 /var/lib/mysql
# 启动MySQL
sudo systemctl start mysqld
# 验证数据
mysql -uroot -pMySQL@8080 -e "select * from xenon_test.t1;"
八、常见故障排查(生产环境必备)
8.1 Xenon 相关故障
- Xenon 启动失败 :查看日志
/var/log/xenon/xenon.log,常见原因:MySQL 连接失败(配置文件密码错误)、8080 端口被占用(netstat -tulpn | grep 8080杀死进程)、目录权限不足(chmod -R 755 /var/lib/xenon /var/log/xenon); - Xenon 无主节点 :检查 3 个节点
raft.peer_addresses配置是否一致、8080 端口互通、node_id唯一,重启所有 Xenon(sudo systemctl restart xenon); - VIP 漂移失败 :检查网卡名称(
ip addr)、VIP 网段与本机一致、安装 ipvsadm(dnf/apt install -y ipvsadm),手动绑定 VIP(xenon cli vip -H 127.0.0.1:8080 -a bind)。
8.2 MySQL 主从复制故障
- 复制线程异常(Slave_IO/SQL_Running=No) :
- 复制账号密码错误:重新执行
CHANGE MASTER TO; - GTID 不一致:
RESET SLAVE ALL; CHANGE MASTER TO MASTER_AUTO_POSITION=1; START SLAVE;; - 数据冲突:跳过错误(
STOP SLAVE; SET GLOBAL sql_slave_skip_counter=1; START SLAVE;,生产谨慎);
- 复制账号密码错误:重新执行
- 复制延迟过高:检查主节点负载、网络带宽,优化 MySQL 参数,增大从节点配置。
8.3 监控告警故障
- Prometheus 采集不到指标 :检查 exporter 是否启动、端口互通、Prometheus 配置文件
prometheus.yml目标地址正确,重启 Prometheus; - 告警不发送 :检查 Alertmanager 配置文件(邮件 / 企业微信参数正确)、Prometheus 告警规则语法正确、网络能访问 SMTP / 企业微信接口,查看 Alertmanager 日志(
journalctl -u alertmanager)。
8.4 备份故障
- 全量备份失败:检查 xtrabackup 版本与 MySQL 匹配、MySQL 服务正常、备份目录有足够空间、MySQL 账号有足够权限;
- 增量备份失败:检查最新全量备份是否存在、增量备份基于的目录正确,查看备份日志。
九、生产环境最佳实践
- 集群节点:推荐 3/5 个奇数节点,避免 Raft 脑裂,生产环境每个节点至少 4G 内存 / 100G SSD;
- 权限管理 :避免使用 root 作为 Xenon / 监控 / 备份账号,创建最小权限专用账号,配置文件权限设为 600(
chmod 600 /etc/xenon/xenon.json /etc/mysqld_exporter.cnf); - 存储:备份目录挂载专用磁盘 / 网络存储(如 NFS),避免本机磁盘损坏导致备份丢失;
- 监控告警:新增备份失败告警(监控备份日志 / 备份目录文件),调整告警阈值适配业务场景,避免误告警;
- 备份验证:每周手动执行一次备份恢复,确保备份可用,高可用≠备份;
- 容器化:国产 Linux 环境可基于麒麟容器引擎 / 欧拉 K8s 将 MySQL+Xenon 容器化部署,实现快速扩缩容;
- 版本管理:使用稳定版软件(MySQL8.0 LTS、Xenon v1.1.5、Prometheus2.50.0),避免开发版 bug。
十、总结
基于 Xenon 的 MySQL 高可用集群,通过「GTID + 半同步主从复制」保证数据一致性,通过「Xenon Raft 共识 + VIP 漂移」实现秒级自动故障转移,结合Prometheus+Grafana+Alertmanager 实现全指标监控告警,通过percona-xtrabackup+binlog+crontab 实现定时全量 + 增量备份,解决了传统 MySQL 主从集群的脑裂、数据丢失、人工介入、无监控、无备份等痛点,且所有组件轻量、无强依赖,适配银河麒麟 / 欧拉等国产 Linux 系统,部署和维护成本低,可满足中小规模到中大型业务的 MySQL 高可用需求。