一、核心理论 (Theory)
1. 高可用集群基础
- 目标:解决单点故障 (SPoF),提升系统可用性 (SLA)。
- 公式:可用性 𝐴=𝑀𝑇𝐵𝐹/(𝑀𝑇𝐵𝐹+𝑀𝑇𝑇𝑅)A=MTBF/(MTBF+MTTR)。核心是降低平均修复时间 (MTTR)。
- 实现机制:通过冗余机制(主/备 active/passive 或 双主 active/active)和心跳检测 (Heartbeat) 实现故障转移
2. VRRP 协议 (Virtual Router Redundancy Protocol)
Keepalived 的核心实现协议,用于解决静态网关单点风险。
关键术语:
VRID:虚拟路由器标识 (0-255),同一组必须一致。
VIP:虚拟 IP,对外服务的地址。
VMAC:虚拟 MAC (格式:00-00-5e-00-01-VRID)。
Priority:优先级 (1-254),值越大越优先成为 Master。
工作模式:
抢占式 (Preempt):默认模式。高优先级节点恢复后,立即抢回 Master 角色(可能导致网络抖动)。
非抢占式 (Nopreempt):高优先级节点恢复后,不抢回 Master,维持现状。配置要求:所有节点 state 必须设为 BACKUP。
延迟抢占:高优先级恢复后,等待指定时间 (preempt_delay) 再抢回。
通信方式:
多播 (Multicast):默认方式,使用组播地址 (如 224.0.0.18)。
单播 (Unicast):需配置 unicast_src_ip 和 unicast_peer,减少网络拥塞。注意 :启用单播时不能开启 vrrp_strict 模式。
3. Keepalived 架构与功能
核心组件:VRRP Stack (心跳通告), Checkers (健康检测), IPVS Wrapper (生成负载均衡规则), System Call (脚本调用)。
主要功能:
基于 VRRP 实现 VIP 漂移。
为 LVS (IPVS) 集群提供高可用。
对后端 Real Server (RS) 进行健康状态检测。
支持自定义脚本监控任意服务 (如 Nginx, HAProxy)。
二、核心实验与部署 (Experiments)
1. 环境准备
时间同步:所有节点必须同步 (ntp/chrony)。
网络互通:关闭防火墙/SELinux,确保节点间通信。
安装:dnf install keepalived -y。
配置文件:/etc/keepalived/keepalived.conf。
Global_defs:全局配置 (邮件通知, router_id)。
VRRP_instance:定义虚拟路由器 (state, interface, priority, vip)。
Virtual_server:定义 LVS 集群 (可选)。
2. 实验场景详解
场景 A:主/备 (Master/Slave) 单主架构
配置要点:
Master: state MASTER, priority 100。
Backup: state BACKUP, priority 80。
两者 virtual_router_id 和 auth_pass 必须一致。
测试:停止 Master 的 keepalived 服务,观察 Backup 是否获取 VIP (ip addr);重启 Master,观察是否发生抢占。


场景 B:非抢占模式 (Nopreempt)
目的:防止 VIP 频繁漂移。
配置要点:
所有节点 state 均设为 BACKUP。
添加 nopreempt 参数。
注释掉 vrrp_strict (否则可能无法启动)。
现象:原 Master 故障恢复后,即使优先级高,也不会抢回 VIP,直到当前 Master 也故障。
bash
#kA1中
[root@KA1 ~]# vim /etc/keepalived/keepalived.conf
vrrp_instance WEB_VIP {
state BACKUP #非抢占模式互为backup
interface eth0
virtual_router_id 51
nopreempt #启动非抢占模式
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
172.25.254.100/24 dev eth0 label eth0:0
}
}
[root@KA1 ~]# systemctl stop keepalived.service
#KA2中
[root@KA2 ~]# vim /etc/keepalived/keepalived.conf
vrrp_instance WEB_VIP {
state BACKUP
interface eth0
virtual_router_id 51
nopreempt #开启非抢占模式
priority 80
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
172.25.254.100/24 dev eth0 label eth0:0
}
}
[root@KA2 ~]# systemctl stop keepalived.service
#测试:
[root@KA1 ~]# systemctl start keepalived.service
[root@KA2 ~]# systemctl start keepalived.service
[root@KA1 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.25.254.50 netmask 255.255.255.0 broadcast 172.25.254.255
inet6 fe80::3901:aeea:786a:7227 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:26:33:d9 txqueuelen 1000 (Ethernet)
RX packets 18917 bytes 1546417 (1.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 34775 bytes 3349412 (3.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.25.254.100 netmask 255.255.255.0 broadcast 0.0.0.0
ether 00:0c:29:26:33:d9 txqueuelen 1000 (Ethernet)
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 162 bytes 9028 (8.8 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 162 bytes 9028 (8.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@KA1 ~]# systemctl stop keepalived.service
[root@KA2 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.25.254.60 netmask 255.255.255.0 broadcast 172.25.254.255
inet6 fe80::26df:35e5:539:56bc prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:1e:fd:7a txqueuelen 1000 (Ethernet)
RX packets 22521 bytes 1553701 (1.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 18517 bytes 1535122 (1.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.25.254.100 netmask 255.255.255.0 broadcast 0.0.0.0
ether 00:0c:29:1e:fd:7a txqueuelen 1000 (Ethernet)
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 84 bytes 5128 (5.0 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 84 bytes 5128 (5.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
#开启KA1的服务ip不会被抢占到1中
[root@KA1 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.25.254.50 netmask 255.255.255.0 broadcast 172.25.254.255
inet6 fe80::3901:aeea:786a:7227 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:26:33:d9 txqueuelen 1000 (Ethernet)
RX packets 19102 bytes 1561277 (1.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 35034 bytes 3375682 (3.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 162 bytes 9028 (8.8 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 162 bytes 9028 (8.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
bash
#kA1中
[root@KA1 ~]# vim /etc/keepalived/keepalived.conf
vrrp_instance WEB_VIP {
state BACKUP #非抢占模式互为backup
interface eth0
virtual_router_id 51
nopreempt #启动非抢占模式
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
172.25.254.100/24 dev eth0 label eth0:0
}
}
[root@KA1 ~]# systemctl stop keepalived.service
#KA2中
[root@KA2 ~]# vim /etc/keepalived/keepalived.conf
vrrp_instance WEB_VIP {
state BACKUP
interface eth0
virtual_router_id 51
nopreempt #开启非抢占模式
priority 80
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
172.25.254.100/24 dev eth0 label eth0:0
}
}
[root@KA2 ~]# systemctl stop keepalived.service
#测试:
[root@KA1 ~]# systemctl start keepalived.service
[root@KA2 ~]# systemctl start keepalived.service
[root@KA1 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.25.254.50 netmask 255.255.255.0 broadcast 172.25.254.255
inet6 fe80::3901:aeea:786a:7227 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:26:33:d9 txqueuelen 1000 (Ethernet)
RX packets 18917 bytes 1546417 (1.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 34775 bytes 3349412 (3.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.25.254.100 netmask 255.255.255.0 broadcast 0.0.0.0
ether 00:0c:29:26:33:d9 txqueuelen 1000 (Ethernet)
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 162 bytes 9028 (8.8 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 162 bytes 9028 (8.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@KA1 ~]# systemctl stop keepalived.service
[root@KA2 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.25.254.60 netmask 255.255.255.0 broadcast 172.25.254.255
inet6 fe80::26df:35e5:539:56bc prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:1e:fd:7a txqueuelen 1000 (Ethernet)
RX packets 22521 bytes 1553701 (1.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 18517 bytes 1535122 (1.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.25.254.100 netmask 255.255.255.0 broadcast 0.0.0.0
ether 00:0c:29:1e:fd:7a txqueuelen 1000 (Ethernet)
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 84 bytes 5128 (5.0 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 84 bytes 5128 (5.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
#开启KA1的服务ip不会被抢占到1中
[root@KA1 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.25.254.50 netmask 255.255.255.0 broadcast 172.25.254.255
inet6 fe80::3901:aeea:786a:7227 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:26:33:d9 txqueuelen 1000 (Ethernet)
RX packets 19102 bytes 1561277 (1.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 35034 bytes 3375682 (3.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 162 bytes 9028 (8.8 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 162 bytes 9028 (8.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
场景 C:双主架构 (Master/Master)
目的:充分利用两台服务器资源,互为备份。
配置思路:
定义两个 VRRP 实例 (例如 VI_1 和 VI_2),对应两个 VIP (VIP1, VIP2)。
节点 1: VI_1 为 MASTER (高优先级), VI_2 为 BACKUP (低优先级)。
节点 2: VI_1 为 BACKUP (低优先级), VI_2 为 MASTER (高优先级)。
结果:节点 1 主导 VIP1,节点 2 主导 VIP2,任意一台故障,另一台接管所有 VIP。




场景 D:结合 LVS (IPVS) 实现负载均衡 高可用
配置结构:在 virtual_server 块中定义 VIP 和端口,以及后端 real_server。
健康检测:
TCP_CHECK: 检测端口连通性。
HTTP_GET: 检测 URL 返回状态码 (如 200)。
工作流程:Keepalived 监控 RS 状态,自动从 IPVS 表中剔除故障节点;若所有 RS 故障,流量转至 sorry_server。
LVS-DR 模式准备:后端 RS 需在 lo 接口绑定 VIP,并调整 ARP 参数 (arp_ignore=1, arp_announce=2)。
场景 E:自定义脚本监控 (VRRP Script)
目的:监控非标准服务 (如 HAProxy, Nginx 进程),根据结果动态调整优先级。
配置步骤:
定义脚本 (vrrp_script): 设置脚本路径、执行间隔 (interval)、权重变化 (weight, 通常设为负数)、失败/成功次数 (fall/rise)。
调用脚本 (track_script): 在 vrrp_instance 中引用定义的脚本名。
逻辑:脚本返回非 0 (失败) -> 节点优先级降低 (Priority + weight) -> 若低于备用节点,则触发 VIP 漂移。
- 调试与排错
日志查看:
修改 /etc/sysconfig/keepalived 添加 -D -S 6 开启详细日志。
配置 rsyslog 将 local6 设施输出到 /var/log/keepalived.log。
抓包分析:使用 tcpdump 监听 VRRP 组播或单播包,观察 Advertisement 报文中的优先级和状态。
命令示例:tcpdump -i eth0 -nn host 224.0.0.18 (多播) 或指定单播 IP。
配置检查:keepalived -t -f /etc/keepalived/keepalived.conf 检查语法。
此总结涵盖了从 VRRP 原理到企业级双主、LVS 集成及脚本监控的核心实操内容。