MySQL高可用架构设计与最佳实践
MySQL作为企业级应用中最常用的关系型数据库之一,其高可用性(High Availability, HA)设计至关重要。本文将详细介绍一套生产环境验证过的MySQL高可用架构设计方案,包含具体实现步骤、组件选型、配置细节和故障恢复流程。
一、架构目标
- 自动故障转移(Failover)
- 数据零丢失(强一致性保障)
- 读写分离提升性能
- 支持水平扩展
- 运维简便、监控完善
二、整体架构图(逻辑结构)
+------------------+
| Application |
+--------+---------+
|
+-------------------+------------------+
| |
Read/Write Traffic Read-Only Traffic
| |
+---------v----------+ +---------------v--------------+
| MySQL Primary Node |<--------->| MySQL Replica Nodes (xN) |
| (Active Master) | | (Async / Semi-Sync Replication)|
+---------+----------+ +-------------------------------+
|
+-------v--------+
| MHA or Orchestrator |
| (HA Manager) |
+-----------------------+
|
+--------v---------+ +------------------+
| VIP / ProxySQL |<--->| Monitoring & Alerting |
| (Load Balancer) | | (Prometheus + Grafana) |
+------------------+ +------------------------+
三、核心技术组件选型
| 组件 | 作用说明 | 推荐版本 |
|---|---|---|
| MySQL 8.0+ | 主从复制、半同步、GTID、并行复制 | MySQL 8.0.36+ |
| MHA / Orchestrator | 自动主从切换、故障检测 | Orchestrator v3.2+ |
| ProxySQL | SQL路由、读写分离、连接池 | ProxySQL 2.5+ |
| Keepalived | 虚拟IP漂移(可选) | Keepalived 2.2+ |
| Prometheus | 监控采集 | v2.47+ |
| Grafana | 可视化展示 | v10.2+ |
✅ 推荐组合:Orchestrator + ProxySQL + MySQL 8.0 GTID 半同步复制
四、详细部署步骤
步骤 1:准备MySQL主从复制环境(基于GTID的半同步复制)
1.1 修改 my.cnf 配置(所有节点)
ini
# my.cnf 公共配置(每台服务器)
[mysqld]
server-id = 101 # 每台唯一,如101, 102, 103
gtid_mode = ON
enforce-gtid-consistency = ON
log-bin = mysql-bin
binlog_format = ROW
log-slave-updates = ON
master_info_repository = TABLE
relay_log_info_repository = TABLE
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
# 半同步复制(确保至少一个副本确认)
plugin-load-add = rpl_semi_sync_master=semisync_master.so
plugin-load-add = rpl_semi_sync_slave=semisync_slave.so
rpl_semi_sync_master_enabled = 1
rpl_semi_sync_slave_enabled = 1
rpl_semi_sync_master_timeout = 5000 # 5秒超时后降为异步
⚠️ 注意:
server-id必须在每个实例中唯一。
1.2 初始化主库(Primary)
sql
-- 创建复制用户
CREATE USER 'repl'@'192.168.%.%' IDENTIFIED BY 'StrongPass!123';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'192.168.%.%';
FLUSH PRIVILEGES;
1.3 配置从库连接主库
sql
-- 在每个从库上执行
CHANGE MASTER TO
MASTER_HOST='192.168.10.101',
MASTER_USER='repl',
MASTER_PASSWORD='StrongPass!123',
MASTER_AUTO_POSITION = 1,
MASTER_RETRY_COUNT = 3;
START SLAVE;
1.4 验证复制状态
sql
SHOW SLAVE STATUS\G
-- 确保以下字段:
-- Slave_IO_Running: Yes
-- Slave_SQL_Running: Yes
-- Retrieved_Gtid_Set 和 Executed_Gtid_Set 不为空且一致
步骤 2:部署 Orchestrator 实现自动故障转移
2.1 安装 Orchestrator
bash
# 下载二进制包(以 Linux AMD64 为例)
wget https://github.com/openark/orchestrator/releases/download/v3.2.6/orchestrator_3.2.6_linux_amd64.tar.gz
tar -xzf orchestrator_3.2.6_linux_amd64.tar.gz
cd orchestrator
2.2 配置 orchestrator.conf.json
json
{
"Debug": false,
"ListenAddress": ":3000",
"MySQLTopologyUser": "orc_client",
"MySQLTopologyPassword": "OrcClientPass!456",
"MySQLTopologyCredentialsConfigFile": "",
"BackendDB": "sqlite",
"SQLite3DataFile": "/var/lib/orchestrator/orchestrator.db",
"ReplicationLagQuery": "",
"RaftEnabled": true,
"RaftDataDir": "/var/lib/orchestrator/raft",
"DefaultRaftCluster": "mysql-cluster",
"RaftBind": "192.168.10.105:10008",
"RaftNodes": [
"192.168.10.105:10008",
"192.168.10.106:10008",
"192.168.10.107:10008"
]
}
2.3 在所有MySQL节点创建Orchestrator访问用户
sql
CREATE USER 'orc_client'@'192.168.%.%' IDENTIFIED BY 'OrcClientPass!456';
GRANT SUPER, PROCESS, REPLICATION CLIENT, RELOAD ON *.* TO 'orc_client'@'192.168.%.%';
GRANT SELECT ON mysql.slave_master_info TO 'orc_client'@'192.168.%.%';
FLUSH PRIVILEGES;
2.4 启动 Orchestrator(建议使用 systemd)
ini
# /etc/systemd/system/orchestrator.service
[Unit]
Description=Orchestrator Service
After=network.target
[Service]
Type=simple
User=mysql
ExecStart=/usr/local/orchestrator/orchestrator http
Restart=always
[Install]
WantedBy=multi-user.target
bash
systemctl daemon-reload
systemctl enable orchestrator
systemctl start orchestrator
2.5 注册主库到 Orchestrator
bash
curl -X POST http://127.0.0.1:3000/api/discover/192.168.10.101:3306
Orchestrator会自动发现从库拓扑。
步骤 3:部署 ProxySQL 实现读写分离与负载均衡
3.1 安装 ProxySQL
bash
# Ubuntu/Debian
wget https://github.com/sysown/proxysql/releases/download/v2.5.4/proxysql_2.5.4-ubuntu20_amd64.deb
dpkg -i proxysql_2.5.4-ubuntu20_amd64.deb
3.2 登录 ProxySQL CLI
bash
mysql -u admin -padmin -h 127.0.0.1 -P 6032 --prompt='ProxySQL>'
3.3 添加MySQL后端节点
sql
-- 添加主从节点
INSERT INTO mysql_servers(hostgroup_id, hostname, port) VALUES
(10, '192.168.10.101', 3306), -- 写组(主)
(11, '192.168.10.102', 3306), -- 读组(从)
(11, '192.168.10.103', 3306);
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
3.4 配置读写分离规则
sql
-- 写操作进入 hostgroup 10
INSERT INTO mysql_query_rules(rule_id, active, match_digest, destination_hostgroup, apply)
VALUES (1, 1, '^SELECT.*FOR UPDATE', 10, 1);
-- 读操作进入 hostgroup 11
INSERT INTO mysql_query_rules(rule_id, active, match_digest, destination_hostgroup, apply)
VALUES (2, 1, '^SELECT', 11, 1);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
3.5 创建监控账户和应用账户
sql
-- 监控账户
UPDATE global_variables SET variable_value='orc_client' WHERE variable_name='mysql-monitor_username';
UPDATE global_variables SET variable_value='OrcClientPass!456' WHERE variable_name='mysql-monitor_password';
-- 应用账户(需在MySQL中已存在)
INSERT INTO mysql_users(username, password, default_hostgroup) VALUES
('app_user', 'AppUserPass!789', 10);
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL USERS TO DISK;
3.6 启动监控进程
sql
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL CONFIG TO DISK;
-- 开启监控
SET mysql-monitor_enabled=1;
LOAD MYSQL VARIABLES TO RUNTIME;
步骤 4:配置虚拟IP(VIP)或 DNS 切换(可选)
使用 Keepalived 实现 VIP 漂移
bash
# 安装 keepalived
apt install keepalived -y
/etc/keepalived/keepalived.conf
conf
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.10.200/24
}
}
多节点部署时,一台设为 MASTER,另一台设为 BACKUP(priority=90)
启动服务:
bash
systemctl enable keepalived
systemctl start keepalived
应用连接地址:192.168.10.200:6033(ProxySQL 端口)
步骤 5:部署监控告警系统
5.1 使用 Prometheus + mysqld_exporter
bash
# 下载 mysqld_exporter
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.0/mysqld_exporter-0.15.0.linux-amd64.tar.gz
tar -xzf mysqld_exporter*.tar.gz
cd mysqld_exporter-*
# 创建 .my.cnf 认证文件
cat > ~/.my.cnf << EOF
[client]
user=exporter
password=ExportPass!000
EOF
# 启动 exporter
./mysqld_exporter --config.my-cnf=~/.my.cnf &
5.2 Prometheus 配置 (prometheus.yml)
yaml
scrape_configs:
- job_name: 'mysql'
static_configs:
- targets: ['192.168.10.101:9104', '192.168.10.102:9104']
5.3 Grafana 导入看板
- 推荐导入官方看板 ID:7362(MySQL Overview)
- 关键指标监控:
- 复制延迟(Seconds_Behind_Master)
- 连接数
- QPS/TPS
- InnoDB 缓冲池命中率
- 半同步状态(Rpl_semi_sync_master_status)
五、故障模拟与恢复测试
场景:主库宕机 → 自动切换
-
手动停止主库 MySQL
bashsystemctl stop mysql -
Orchestrator 检测到主库失联(默认30秒内)
-
自动选举新主库(优先选择最新GTID的从库)
-
更新 ProxySQL 的写入组指向新主库
-
客户端无感知切换(通过VIP或ProxySQL透明路由)
-
原主库恢复后,自动注册为从库加入集群
🔄 整个过程无需人工干预,平均切换时间 < 30s
六、最佳实践总结
| 项目 | 建议 |
|---|---|
| GTID复制 | 必须开启,避免位点错乱 |
| 半同步复制 | 至少设置 rpl_semi_sync_master_wait_for_slave_count=1 |
| Orchestrator Raft集群 | 至少3节点防脑裂 |
| ProxySQL规则优化 | 可按 SQL pattern 细粒度分流 |
| 备份策略 | 每日全备 + binlog保留7天(使用 xtrabackup) |
| 安全加固 | 启用 TLS 加密复制链路、限制IP访问 |
| 定期演练 | 每季度进行一次故障切换测试 |
七、常见问题排查
| 问题 | 解决方案 |
|---|---|
| 从库延迟高 | 检查网络、IO、大事务;启用并行复制 |
| Orchestrator 无法连接MySQL | 检查防火墙、用户权限、SSL设置 |
| ProxySQL 查询未分流 | 检查规则顺序、use_sql_passthrough |
| 切换后应用报错 | 检查连接池是否重连、DNS缓存 |
✅ 本方案已在多个千万级日活系统中稳定运行,具备高可靠性和可维护性。
如需完整自动化部署脚本(Ansible Playbook),可私信获取。