系列文章索引:
- 第一篇:《Nginx入门与安装详解:从零开始搭建高性能Web服务器》
- 第二篇:《Nginx基础配置详解:nginx.conf核心配置与虚拟主机实战》
- 第三篇:《Nginx代理配置详解:正向代理与反向代理完全指南》
- 第四篇:《Nginx性能优化与安全配置:打造高性能Web服务器》
- 第五篇:《Nginx负载均衡配置详解:多种负载均衡策略实战》
- 第六篇:《Nginx高可用方案实战:Keepalived+双机热备部署》
前言
在生产环境中,单点故障是系统架构的大敌。即使是性能再好的Nginx服务器,一旦宕机也会导致整个服务不可用。为了确保业务连续性和高可用性,我们需要构建高可用架构,实现故障自动转移和服务无缝切换。
本文将详细介绍如何使用Keepalived+双机热备方案构建Nginx高可用集群,通过实际案例演示从架构设计到部署配置的完整过程,帮助你掌握企业级高可用解决方案的搭建技能。
一、高可用基础概念
1.1 什么是高可用性
高可用性(High Availability,简称HA)是指系统通过设计减少系统停机时间,保持服务持续可用的能力。通常用"几个9"来衡量可用性:
可用性等级:
- 99%:年停机时间约3.65天
- 99.9%:年停机时间约8.76小时
- 99.99%:年停机时间约52.6分钟
- 99.999%:年停机时间约5.26分钟
高可用性核心要素:
- 冗余设计:关键组件冗余部署
- 故障检测:实时监控系统状态
- 故障转移:自动切换到备用系统
- 数据一致性:确保数据同步和一致性
1.2 高可用架构模式
主备模式(Active-Standby)
客户端 → 虚拟IP → 主服务器(Active)
↓
备用服务器(Standby)
特点:
- 一台主服务器提供服务
- 一台或多台备用服务器待命
- 主服务器故障时自动切换到备用服务器
- 资源利用率较低,但实现简单
双活模式(Active-Active)
客户端 → 负载均衡器 → 服务器1(Active)
↓
服务器2(Active)
特点:
- 多台服务器同时提供服务
- 负载均衡分配请求
- 单台服务器故障时自动剔除
- 资源利用率高,但配置复杂
集群模式(Cluster)
客户端 → 集群管理器 → 节点1
↓
节点2
↓
节点3
特点:
- 多个节点组成集群
- 统一的集群管理
- 动态扩展和收缩
- 适用于大规模系统
1.3 Keepalived简介
Keepalived概述
Keepalived是一个基于VRRP协议的高可用解决方案,主要用于:
核心功能:
- 健康检查:监控服务器状态
- 故障检测:及时发现系统故障
- 故障转移:自动切换到备用服务器
- 虚拟IP管理:管理虚拟IP地址
工作原理:
- 基于VRRP(Virtual Router Redundancy Protocol)协议
- 通过多播通信选举主服务器
- 主服务器定期发送心跳包
- 备用服务器监听心跳包
- 心跳超时后触发故障转移
Keepalived组件
核心组件:
- vrrpd:VRRP协议实现
- checkers:健康检查模块
- IPVS wrapper:IPVS包装器
- Netlink Reflector:网络接口管理
配置文件:
/etc/keepalived/keepalived.conf
:主配置文件/etc/sysconfig/keepalived
:启动参数配置
二、Keepalived安装与配置
2.1 系统环境准备
服务器规划
角色 | 主机名 | IP地址 | 虚拟IP | 系统版本 |
---|---|---|---|---|
主服务器 | nginx-master | 192.168.1.10 | 192.168.1.100 | CentOS 7 |
备用服务器 | nginx-backup | 192.168.1.11 | 192.168.1.100 | CentOS 7 |
系统初始化
主服务器初始化:
bash
#!/bin/bash
# 主服务器系统初始化脚本
# 设置主机名
hostnamectl set-hostname nginx-master
# 关闭防火墙和SELinux
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
# 配置hosts文件
cat >> /etc/hosts << EOF
192.168.1.10 nginx-master
192.168.1.11 nginx-backup
192.168.1.100 nginx-vip
EOF
# 同步时间
yum install -y ntpdate
ntpdate -u cn.pool.ntp.org
systemctl enable ntpd
systemctl start ntpd
# 安装基础工具
yum install -y wget curl vim net-tools telnet
# 配置内核参数
cat >> /etc/sysctl.conf << EOF
# 允许绑定非本地IP地址
net.ipv4.ip_nonlocal_bind = 1
# 开启IP转发
net.ipv4.ip_forward = 1
# 关闭ICMP重定向
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.eth0.send_redirects = 0
# 开启ARP代理
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
EOF
# 应用内核参数
sysctl -p
echo "主服务器初始化完成"
备用服务器初始化:
bash
#!/bin/bash
# 备用服务器系统初始化脚本
# 设置主机名
hostnamectl set-hostname nginx-backup
# 关闭防火墙和SELinux
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
# 配置hosts文件
cat >> /etc/hosts << EOF
192.168.1.10 nginx-master
192.168.1.11 nginx-backup
192.168.1.100 nginx-vip
EOF
# 同步时间
yum install -y ntpdate
ntpdate -u cn.pool.ntp.org
systemctl enable ntpd
systemctl start ntpd
# 安装基础工具
yum install -y wget curl vim net-tools telnet
# 配置内核参数
cat >> /etc/sysctl.conf << EOF
# 允许绑定非本地IP地址
net.ipv4.ip_nonlocal_bind = 1
# 开启IP转发
net.ipv4.ip_forward = 1
# 关闭ICMP重定向
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.eth0.send_redirects = 0
# 开启ARP代理
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
EOF
# 应用内核参数
sysctl -p
echo "备用服务器初始化完成"
2.2 Nginx安装与配置
主服务器Nginx安装
bash
#!/bin/bash
# 主服务器Nginx安装脚本
# 安装依赖
yum install -y gcc gcc-c++ make cmake autoconf automake pcre pcre-devel zlib zlib-devel openssl openssl-devel
# 创建用户和组
groupadd nginx
useradd -r -g nginx -s /sbin/nologin nginx
# 下载Nginx源码
cd /usr/local/src
wget http://nginx.org/download/nginx-1.24.0.tar.gz
tar -zxvf nginx-1.24.0.tar.gz
cd nginx-1.24.0
# 编译安装
./configure \
--prefix=/usr/local/nginx \
--user=nginx \
--group=nginx \
--with-http_ssl_module \
--with-http_stub_status_module \
--with-http_realip_module \
--with-http_gzip_static_module \
--with-pcre \
--with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' \
--with-ld-opt='-Wl,-z,relro -Wl,-z,now -pie'
make && make install
# 创建必要目录
mkdir -p /var/log/nginx
mkdir -p /var/cache/nginx
chown -R nginx:nginx /var/log/nginx
chown -R nginx:nginx /var/cache/nginx
# 创建systemd服务文件
cat > /usr/lib/systemd/system/nginx.service << 'EOF'
[Unit]
Description=The nginx HTTP and reverse proxy server
After=network.target remote-fs.target nss-lookup.target
[Service]
Type=forking
PIDFile=/var/run/nginx.pid
ExecStartPre=/usr/local/nginx/sbin/nginx -t
ExecStart=/usr/local/nginx/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
[Install]
WantedBy=multi-user.target
EOF
# 重新加载systemd
systemctl daemon-reload
# 启动Nginx
systemctl start nginx
systemctl enable nginx
# 验证安装
nginx -v
systemctl status nginx
echo "主服务器Nginx安装完成"
备用服务器Nginx安装
bash
#!/bin/bash
# 备用服务器Nginx安装脚本
# 安装依赖
yum install -y gcc gcc-c++ make cmake autoconf automake pcre pcre-devel zlib zlib-devel openssl openssl-devel
# 创建用户和组
groupadd nginx
useradd -r -g nginx -s /sbin/nologin nginx
# 下载Nginx源码
cd /usr/local/src
wget http://nginx.org/download/nginx-1.24.0.tar.gz
tar -zxvf nginx-1.24.0.tar.gz
cd nginx-1.24.0
# 编译安装
./configure \
--prefix=/usr/local/nginx \
--user=nginx \
--group=nginx \
--with-http_ssl_module \
--with-http_stub_status_module \
--with-http_realip_module \
--with-http_gzip_static_module \
--with-pcre \
--with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' \
--with-ld-opt='-Wl,-z,relro -Wl,-z,now -pie'
make && make install
# 创建必要目录
mkdir -p /var/log/nginx
mkdir -p /var/cache/nginx
chown -R nginx:nginx /var/log/nginx
chown -R nginx:nginx /var/cache/nginx
# 创建systemd服务文件
cat > /usr/lib/systemd/system/nginx.service << 'EOF'
[Unit]
Description=The nginx HTTP and reverse proxy server
After=network.target remote-fs.target nss-lookup.target
[Service]
Type=forking
PIDFile=/var/run/nginx.pid
ExecStartPre=/usr/local/nginx/sbin/nginx -t
ExecStart=/usr/local/nginx/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
[Install]
WantedBy=multi-user.target
EOF
# 重新加载systemd
systemctl daemon-reload
# 启动Nginx
systemctl start nginx
systemctl enable nginx
# 验证安装
nginx -v
systemctl status nginx
echo "备用服务器Nginx安装完成"
Nginx配置同步
主服务器Nginx配置:
nginx
# /usr/local/nginx/conf/nginx.conf
user nginx nginx;
worker_processes auto;
worker_cpu_affinity auto;
worker_rlimit_nofile 65535;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 65535;
use epoll;
multi_accept on;
}
http {
include /usr/local/nginx/conf/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
keepalive_requests 1000;
gzip on;
gzip_min_length 1k;
gzip_buffers 4 16k;
gzip_http_version 1.1;
gzip_comp_level 6;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
gzip_vary on;
server {
listen 80;
server_name localhost;
location / {
root /usr/local/nginx/html;
index index.html index.htm;
}
location /status {
stub_status on;
access_log off;
allow 127.0.0.1;
allow 192.168.1.0/24;
deny all;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/local/nginx/html;
}
}
}
同步配置到备用服务器:
bash
# 在主服务器上执行
scp /usr/local/nginx/conf/nginx.conf root@192.168.1.11:/usr/local/nginx/conf/nginx.conf
# 在备用服务器上重新加载配置
systemctl reload nginx
2.3 Keepalived安装与配置
Keepalived安装
主服务器Keepalived安装:
bash
#!/bin/bash
# 主服务器Keepalived安装脚本
# 安装Keepalived
yum install -y keepalived
# 备份原始配置文件
cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
# 创建健康检查脚本目录
mkdir -p /usr/local/keepalived/scripts
chown -R root:root /usr/local/keepalived
chmod -R 755 /usr/local/keepalived
echo "主服务器Keepalived安装完成"
备用服务器Keepalived安装:
bash
#!/bin/bash
# 备用服务器Keepalived安装脚本
# 安装Keepalived
yum install -y keepalived
# 备份原始配置文件
cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
# 创建健康检查脚本目录
mkdir -p /usr/local/keepalived/scripts
chown -R root:root /usr/local/keepalived
chmod -R 755 /usr/local/keepalived
echo "备用服务器Keepalived安装完成"
健康检查脚本
Nginx健康检查脚本:
bash
#!/bin/bash
# /usr/local/keepalived/scripts/check_nginx.sh
# 检查Nginx进程是否存在
if ! pgrep nginx > /dev/null; then
echo "Nginx process not found"
exit 1
fi
# 检查Nginx端口是否监听
if ! netstat -tlnp | grep -q ":80"; then
echo "Nginx port 80 not listening"
exit 1
fi
# 检查Nginx状态页面是否正常响应
if ! curl -f http://localhost/status > /dev/null 2>&1; then
echo "Nginx status page not responding"
exit 1
fi
# 检查Nginx配置文件语法
if ! /usr/local/nginx/sbin/nginx -t > /dev/null 2>&1; then
echo "Nginx configuration test failed"
exit 1
fi
echo "Nginx health check passed"
exit 0
通知脚本:
bash
#!/bin/bash
# /usr/local/keepalived/scripts/notify.sh
# Keepalived通知脚本
# 用法: notify.sh <MASTER|BACKUP|FAULT> <priority>
TYPE=$1
PRIORITY=$2
# 发送邮件通知
send_notification() {
local subject="Keepalived Alert: $TYPE on $(hostname)"
local message="Keepalived state changed to $TYPE with priority $PRIORITY on $(hostname) at $(date)"
echo "$message" | mail -s "$subject" admin@example.com
}
# 记录日志
log_event() {
local message="Keepalived state changed to $TYPE with priority $PRIORITY"
logger -t keepalived "$message"
echo "$(date '+%Y-%m-%d %H:%M:%S') - $message" >> /var/log/keepalived/notify.log
}
# 执行相应操作
case $TYPE in
"MASTER")
log_event
send_notification
# 启动Nginx(如果未运行)
systemctl start nginx
# 可以在这里添加其他主节点启动后的操作
;;
"BACKUP")
log_event
send_notification
# 可以在这里添加备用节点操作
;;
"FAULT")
log_event
send_notification
# 停止Nginx(如果需要)
systemctl stop nginx
# 可以在这里添加故障处理操作
;;
*)
echo "Unknown state: $TYPE"
exit 1
;;
esac
exit 0
设置脚本权限:
bash
# 在两台服务器上都执行
chmod +x /usr/local/keepalived/scripts/check_nginx.sh
chmod +x /usr/local/keepalived/scripts/notify.sh
# 创建日志目录
mkdir -p /var/log/keepalived
chown root:root /var/log/keepalived
chmod 755 /var/log/keepalived
三、Keepalived配置详解
3.1 主服务器Keepalived配置
主服务器配置文件:
nginx
# /etc/keepalived/keepalived.conf
! Configuration File for keepalived
# 全局配置
global_defs {
# 路由器标识,通常设置为主机名
router_id nginx-master
# 设置邮件通知信息
notification_email {
admin@example.com
ops@example.com
}
# 发送邮件的地址
notification_email_from keepalived@example.com
# 邮件服务器地址
smtp_server 127.0.0.1
smtp_connect_timeout 30
# 设置Keepalived进程的用户和组
script_user root
enable_script_security
}
# VRRP实例配置
vrrp_instance VI_1 {
# 状态:MASTER为主服务器,BACKUP为备用服务器
state MASTER
# 网络接口名称
interface eth0
# 虚拟路由器ID,同一个集群中必须相同
virtual_router_id 51
# 优先级,数值越大优先级越高
priority 100
# VRRP通告间隔,单位秒
advert_int 1
# 认证配置
authentication {
auth_type PASS
auth_pass 1111
}
# 虚拟IP地址配置
virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:0
}
# 设置非抢占模式(可选)
# nopreempt
# 设置抢占延迟(可选)
# preempt_delay 300
# 健康检查脚本配置
track_script {
check_nginx
}
# 状态转换通知脚本
notify_master "/usr/local/keepalived/scripts/notify.sh MASTER 100"
notify_backup "/usr/local/keepalived/scripts/notify.sh BACKUP 100"
notify_fault "/usr/local/keepalived/scripts/notify.sh FAULT 100"
# 设置VRRP协议版本
vrrp_version 2
# 设置VRRP协议模式
vrrp_iptables
# 设置VRRP协议严格模式
vrrp_strict
# 设置VRRP协议的组播地址
# vrrp_mcast_group4 224.0.0.18
}
# 健康检查脚本定义
vrrp_script check_nginx {
# 检查脚本路径
script "/usr/local/keepalived/scripts/check_nginx.sh"
# 检查间隔,单位秒
interval 2
# 超时时间,单位秒
timeout 2
# 失败阈值
fall 3
# 成功阈值
rise 2
# 检测失败后的权重调整
weight -5
# 检测脚本的运行用户
user root
# 检测脚本的运行组
group root
# 检测脚本的初始化延迟
# init_delay 5
}
# 虚拟服务器配置(可选,用于LVS)
virtual_server 192.168.1.100 80 {
# 延迟运行时间
delay_loop 6
# 负载均衡算法
lb_algo rr
# 负载均衡模式
lb_kind NAT
# 持久连接超时时间
persistence_timeout 50
# 协议
protocol TCP
# 真实服务器配置
real_server 192.168.1.10 80 {
# 权重
weight 1
# 健康检查方法
TCP_CHECK {
# 连接超时时间
connect_timeout 3
# 重试次数
nb_get_retry 3
# 重试延迟
delay_before_retry 3
# 连接端口
connect_port 80
}
}
real_server 192.168.1.11 80 {
weight 1
TCP_CHECK {
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
connect_port 80
}
}
}
3.2 备用服务器Keepalived配置
备用服务器配置文件:
nginx
# /etc/keepalived/keepalived.conf
! Configuration File for keepalived
# 全局配置
global_defs {
# 路由器标识,通常设置为主机名
router_id nginx-backup
# 设置邮件通知信息
notification_email {
admin@example.com
ops@example.com
}
# 发送邮件的地址
notification_email_from keepalived@example.com
# 邮件服务器地址
smtp_server 127.0.0.1
smtp_connect_timeout 30
# 设置Keepalived进程的用户和组
script_user root
enable_script_security
}
# VRRP实例配置
vrrp_instance VI_1 {
# 状态:MASTER为主服务器,BACKUP为备用服务器
state BACKUP
# 网络接口名称
interface eth0
# 虚拟路由器ID,同一个集群中必须相同
virtual_router_id 51
# 优先级,数值越大优先级越高
priority 90
# VRRP通告间隔,单位秒
advert_int 1
# 认证配置
authentication {
auth_type PASS
auth_pass 1111
}
# 虚拟IP地址配置
virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:0
}
# 设置非抢占模式(可选)
# nopreempt
# 设置抢占延迟(可选)
# preempt_delay 300
# 健康检查脚本配置
track_script {
check_nginx
}
# 状态转换通知脚本
notify_master "/usr/local/keepalived/scripts/notify.sh MASTER 90"
notify_backup "/usr/local/keepalived/scripts/notify.sh BACKUP 90"
notify_fault "/usr/local/keepalived/scripts/notify.sh FAULT 90"
# 设置VRRP协议版本
vrrp_version 2
# 设置VRRP协议模式
vrrp_iptables
# 设置VRRP协议严格模式
vrrp_strict
# 设置VRRP协议的组播地址
# vrrp_mcast_group4 224.0.0.18
}
# 健康检查脚本定义
vrrp_script check_nginx {
# 检查脚本路径
script "/usr/local/keepalived/scripts/check_nginx.sh"
# 检查间隔,单位秒
interval 2
# 超时时间,单位秒
timeout 2
# 失败阈值
fall 3
# 成功阈值
rise 2
# 检测失败后的权重调整
weight -5
# 检测脚本的运行用户
user root
# 检测脚本的运行组
group root
# 检测脚本的初始化延迟
# init_delay 5
}
# 虚拟服务器配置(可选,用于LVS)
virtual_server 192.168.1.100 80 {
# 延迟运行时间
delay_loop 6
# 负载均衡算法
lb_algo rr
# 负载均衡模式
lb_kind NAT
# 持久连接超时时间
persistence_timeout 50
# 协议
protocol TCP
# 真实服务器配置
real_server 192.168.1.10 80 {
# 权重
weight 1
# 健康检查方法
TCP_CHECK {
# 连接超时时间
connect_timeout 3
# 重试次数
nb_get_retry 3
# 重试延迟
delay_before_retry 3
# 连接端口
connect_port 80
}
}
real_server 192.168.1.11 80 {
weight 1
TCP_CHECK {
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
connect_port 80
}
}
}
3.3 配置文件说明
全局配置参数
nginx
global_defs {
router_id nginx-master # 路由器标识
notification_email { # 邮件通知配置
admin@example.com
ops@example.com
}
notification_email_from keepalived@example.com
smtp_server 127.0.0.1 # 邮件服务器
smtp_connect_timeout 30 # 邮件连接超时
script_user root # 脚本运行用户
enable_script_security # 启用脚本安全
}
VRRP实例配置参数
nginx
vrrp_instance VI_1 {
state MASTER # 状态:MASTER/BACKUP
interface eth0 # 网络接口
virtual_router_id 51 # 虚拟路由器ID
priority 100 # 优先级
advert_int 1 # 通告间隔
authentication { # 认证配置
auth_type PASS
auth_pass 1111
}
virtual_ipaddress { # 虚拟IP配置
192.168.1.100/24 dev eth0 label eth0:0
}
track_script { # 健康检查脚本
check_nginx
}
notify_master "/path/to/script.sh MASTER" # 状态转换通知
notify_backup "/path/to/script.sh BACKUP"
notify_fault "/path/to/script.sh FAULT"
}
健康检查脚本配置
nginx
vrrp_script check_nginx {
script "/usr/local/keepalived/scripts/check_nginx.sh" # 脚本路径
interval 2 # 检查间隔
timeout 2 # 超时时间
fall 3 # 失败阈值
rise 2 # 成功阈值
weight -5 # 权重调整
user root # 运行用户
group root # 运行组
}
四、服务启动与验证
4.1 启动Keepalived服务
启动主服务器Keepalived
bash
#!/bin/bash
# 启动主服务器Keepalived
# 检查配置文件语法
keepalived -t
if [ $? -eq 0 ]; then
echo "Keepalived配置文件语法正确"
else
echo "Keepalived配置文件语法错误,请检查"
exit 1
fi
# 启动Keepalived服务
systemctl start keepalived
systemctl enable keepalived
# 检查服务状态
systemctl status keepalived
# 检查Keepalived进程
ps aux | grep keepalived
# 检查虚拟IP
ip addr show eth0
# 检查VRRP状态
keepalived -v
echo "主服务器Keepalived启动完成"
启动备用服务器Keepalived
bash
#!/bin/bash
# 启动备用服务器Keepalived
# 检查配置文件语法
keepalived -t
if [ $? -eq 0 ]; then
echo "Keepalived配置文件语法正确"
else
echo "Keepalived配置文件语法错误,请检查"
exit 1
fi
# 启动Keepalived服务
systemctl start keepalived
systemctl enable keepalived
# 检查服务状态
systemctl status keepalived
# 检查Keepalived进程
ps aux | grep keepalived
# 检查虚拟IP(备用服务器不应该有VIP)
ip addr show eth0
# 检查VRRP状态
keepalived -v
echo "备用服务器Keepalived启动完成"
4.2 验证高可用状态
检查主服务器状态
bash
#!/bin/bash
# 检查主服务器状态
echo "=== 主服务器状态检查 ==="
# 检查Keepalived服务状态
echo "1. Keepalived服务状态:"
systemctl is-active keepalived
# 检查Keepalived进程
echo -e "\n2. Keepalived进程:"
ps aux | grep keepalived | grep -v grep
# 检查虚拟IP
echo -e "\n3. 虚拟IP状态:"
ip addr show eth0 | grep 192.168.1.100
# 检查VRRP状态
echo -e "\n4. VRRP状态:"
tcpdump -i eth0 -n 'host 224.0.0.18' -c 3 2>/dev/null || echo "VRRP通信正常"
# 检查Nginx状态
echo -e "\n5. Nginx服务状态:"
systemctl is-active nginx
# 检查Nginx进程
echo -e "\n6. Nginx进程:"
ps aux | grep nginx | grep -v grep
# 检查Nginx监听端口
echo -e "\n7. Nginx监听端口:"
netstat -tlnp | grep :80
# 测试Nginx响应
echo -e "\n8. Nginx响应测试:"
curl -I http://localhost
echo -e "\n=== 主服务器状态检查完成 ==="
检查备用服务器状态
bash
#!/bin/bash
# 检查备用服务器状态
echo "=== 备用服务器状态检查 ==="
# 检查Keepalived服务状态
echo "1. Keepalived服务状态:"
systemctl is-active keepalived
# 检查Keepalived进程
echo -e "\n2. Keepalived进程:"
ps aux | grep keepalived | grep -v grep
# 检查虚拟IP(备用服务器不应该有VIP)
echo -e "\n3. 虚拟IP状态:"
ip addr show eth0 | grep 192.168.1.100 || echo "虚拟IP未绑定(正常)"
# 检查VRRP状态
echo -e "\n4. VRRP状态:"
tcpdump -i eth0 -n 'host 224.0.0.18' -c 3 2>/dev/null || echo "VRRP通信正常"
# 检查Nginx状态
echo -e "\n5. Nginx服务状态:"
systemctl is-active nginx
# 检查Nginx进程
echo -e "\n6. Nginx进程:"
ps aux | grep nginx | grep -v grep
# 检查Nginx监听端口
echo -e "\n7. Nginx监听端口:"
netstat -tlnp | grep :80
# 测试Nginx响应
echo -e "\n8. Nginx响应测试:"
curl -I http://localhost
echo -e "\n=== 备用服务器状态检查完成 ==="
测试虚拟IP访问
bash
#!/bin/bash
# 测试虚拟IP访问
echo "=== 虚拟IP访问测试 ==="
VIP="192.168.1.100"
# 测试HTTP访问
echo "1. 测试HTTP访问:"
curl -I http://$VIP
# 测试Nginx状态页面
echo -e "\n2. 测试Nginx状态页面:"
curl http://$VIP/status
# 测试连通性
echo -e "\n3. 测试连通性:"
ping -c 3 $VIP
# 测试端口连通性
echo -e "\n4. 测试端口连通性:"
telnet $VIP 80 << EOF
quit
EOF
echo -e "\n=== 虚拟IP访问测试完成 ==="
4.3 故障转移测试
模拟主服务器故障
bash
#!/bin/bash
# 模拟主服务器故障测试
echo "=== 模拟主服务器故障测试 ==="
# 记录测试开始时间
START_TIME=$(date)
echo "测试开始时间:$START_TIME"
# 停止主服务器Nginx服务
echo "1. 停止主服务器Nginx服务:"
systemctl stop nginx
sleep 5
# 检查主服务器状态
echo -e "\n2. 检查主服务器状态:"
systemctl is-active nginx
systemctl is-active keepalived
# 检查虚拟IP是否已转移到备用服务器
echo -e "\n3. 检查虚拟IP转移情况:"
echo "主服务器虚拟IP:"
ssh root@192.168.1.10 "ip addr show eth0 | grep 192.168.1.100 || echo '虚拟IP已释放'"
echo "备用服务器虚拟IP:"
ssh root@192.168.1.11 "ip addr show eth0 | grep 192.168.1.100 || echo '虚拟IP未绑定'"
# 测试虚拟IP访问
echo -e "\n4. 测试虚拟IP访问:"
curl -I http://192.168.1.100
# 检查备用服务器Nginx状态
echo -e "\n5. 检查备用服务器Nginx状态:"
ssh root@192.168.1.11 "systemctl is-active nginx"
# 记录故障转移时间
END_TIME=$(date)
echo -e "\n故障转移完成时间:$END_TIME"
# 恢复主服务器
echo -e "\n6. 恢复主服务器:"
ssh root@192.168.1.10 "systemctl start nginx"
sleep 10
# 检查虚拟IP是否已恢复到主服务器
echo -e "\n7. 检查虚拟IP恢复情况:"
echo "主服务器虚拟IP:"
ssh root@192.168.1.10 "ip addr show eth0 | grep 192.168.1.100 || echo '虚拟IP未绑定'"
echo "备用服务器虚拟IP:"
ssh root@192.168.1.11 "ip addr show eth0 | grep 192.168.1.100 || echo '虚拟IP已释放'"
# 记录恢复时间
RECOVERY_TIME=$(date)
echo -e "\n恢复完成时间:$RECOVERY_TIME"
echo -e "\n=== 故障转移测试完成 ==="
模拟主服务器宕机
bash
#!/bin/bash
# 模拟主服务器宕机测试
echo "=== 模拟主服务器宕机测试 ==="
# 记录测试开始时间
START_TIME=$(date)
echo "测试开始时间:$START_TIME"
# 关闭主服务器
echo "1. 关闭主服务器:"
ssh root@192.168.1.10 "shutdown -h now" || echo "主服务器已关闭"
# 等待故障转移
echo -e "\n2. 等待故障转移(30秒):"
sleep 30
# 检查备用服务器状态
echo -e "\n3. 检查备用服务器状态:"
ssh root@192.168.1.11 "systemctl is-active keepalived"
ssh root@192.168.1.11 "systemctl is-active nginx"
# 检查虚拟IP是否已转移到备用服务器
echo -e "\n4. 检查虚拟IP转移情况:"
ssh root@192.168.1.11 "ip addr show eth0 | grep 192.168.1.100"
# 测试虚拟IP访问
echo -e "\n5. 测试虚拟IP访问:"
curl -I http://192.168.1.100
# 记录故障转移时间
END_TIME=$(date)
echo -e "\n故障转移完成时间:$END_TIME"
# 启动主服务器(需要手动操作)
echo -e "\n6. 请手动启动主服务器,然后按回车键继续..."
read
# 等待恢复
echo -e "\n7. 等待主服务器恢复(30秒):"
sleep 30
# 检查主服务器状态
echo -e "\n8. 检查主服务器状态:"
ssh root@192.168.1.10 "systemctl is-active keepalived"
ssh root@192.168.1.10 "systemctl is-active nginx"
# 检查虚拟IP是否已恢复到主服务器
echo -e "\n9. 检查虚拟IP恢复情况:"
echo "主服务器虚拟IP:"
ssh root@192.168.1.10 "ip addr show eth0 | grep 192.168.1.100 || echo '虚拟IP未绑定'"
echo "备用服务器虚拟IP:"
ssh root@192.168.1.11 "ip addr show eth0 | grep 192.168.1.100 || echo '虚拟IP已释放'"
# 记录恢复时间
RECOVERY_TIME=$(date)
echo -e "\n恢复完成时间:$RECOVERY_TIME"
echo -e "\n=== 主服务器宕机测试完成 ==="
五、监控与维护
5.1 监控配置
Keepalived状态监控脚本
bash
#!/bin/bash
# /usr/local/keepalived/scripts/monitor_keepalived.sh
# Keepalived监控脚本
# 用法:./monitor_keepalived.sh
# 配置参数
VIP="192.168.1.100"
MASTER_IP="192.168.1.10"
BACKUP_IP="192.168.1.11"
LOG_FILE="/var/log/keepalived/monitor.log"
ALERT_EMAIL="admin@example.com"
# 获取当前服务器角色
get_server_role() {
local vip_bound=$(ip addr show | grep -c "$VIP")
local keepalived_state=$(systemctl is-active keepalived)
if [ "$vip_bound" -gt 0 ] && [ "$keepalived_state" = "active" ]; then
echo "MASTER"
elif [ "$vip_bound" -eq 0 ] && [ "$keepalived_state" = "active" ]; then
echo "BACKUP"
else
echo "UNKNOWN"
fi
}
# 检查Keepalived服务状态
check_keepalived_service() {
local status=$(systemctl is-active keepalived)
if [ "$status" != "active" ]; then
echo "Keepalived service is not active: $status"
return 1
fi
return 0
}
# 检查VRRP进程
check_vrrp_process() {
local vrrp_count=$(pgrep -f "keepalived.*VRRP" | wc -l)
if [ "$vrrp_count" -eq 0 ]; then
echo "VRRP process not found"
return 1
fi
return 0
}
# 检查虚拟IP
check_virtual_ip() {
local vip_bound=$(ip addr show | grep -c "$VIP")
if [ "$vip_bound" -eq 0 ]; then
echo "Virtual IP not bound"
return 1
fi
return 0
}
# 检查Nginx服务
check_nginx_service() {
local status=$(systemctl is-active nginx)
if [ "$status" != "active" ]; then
echo "Nginx service is not active: $status"
return 1
fi
return 0
}
# 检查网络连通性
check_network_connectivity() {
if ! ping -c 1 -W 1 8.8.8.8 > /dev/null 2>&1; then
echo "Network connectivity failed"
return 1
fi
return 0
}
# 发送告警
send_alert() {
local message=$1
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
local hostname=$(hostname)
echo "[$timestamp] [$hostname] ALERT: $message" >> $LOG_FILE
# 发送邮件告警
echo "[$timestamp] [$hostname] ALERT: $message" | mail -s "Keepalived Alert: $hostname" $ALERT_EMAIL
}
# 记录监控信息
log_monitor_info() {
local role=$(get_server_role)
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
local hostname=$(hostname)
echo "[$timestamp] [$hostname] Role: $role, Keepalived: $(systemctl is-active keepalived), Nginx: $(systemctl is-active nginx)" >> $LOG_FILE
}
# 主监控函数
main() {
local hostname=$(hostname)
local role=$(get_server_role)
echo "=== Keepalived Monitor ==="
echo "Hostname: $hostname"
echo "Role: $role"
echo "Timestamp: $(date '+%Y-%m-%d %H:%M:%S')"
echo ""
# 检查各项指标
local checks_passed=0
local total_checks=5
echo "1. 检查Keepalived服务状态..."
if check_keepalived_service; then
echo " ✓ Keepalived service is active"
((checks_passed++))
else
echo " ✗ Keepalived service check failed"
send_alert "Keepalived service check failed on $hostname"
fi
echo "2. 检查VRRP进程..."
if check_vrrp_process; then
echo " ✓ VRRP process is running"
((checks_passed++))
else
echo " ✗ VRRP process check failed"
send_alert "VRRP process check failed on $hostname"
fi
echo "3. 检查虚拟IP..."
if [ "$role" = "MASTER" ]; then
if check_virtual_ip; then
echo " ✓ Virtual IP is bound"
((checks_passed++))
else
echo " ✗ Virtual IP check failed"
send_alert "Virtual IP check failed on $hostname"
fi
else
echo " ✓ Virtual IP not bound (expected for BACKUP)"
((checks_passed++))
fi
echo "4. 检查Nginx服务状态..."
if check_nginx_service; then
echo " ✓ Nginx service is active"
((checks_passed++))
else
echo " ✗ Nginx service check failed"
send_alert "Nginx service check failed on $hostname"
fi
echo "5. 检查网络连通性..."
if check_network_connectivity; then
echo " ✓ Network connectivity is OK"
((checks_passed++))
else
echo " ✗ Network connectivity check failed"
send_alert "Network connectivity check failed on $hostname"
fi
echo ""
echo "检查结果: $checks_passed/$total_checks 通过"
# 记录监控信息
log_monitor_info
# 如果检查失败,尝试恢复
if [ "$checks_passed" -lt "$total_checks" ]; then
echo "尝试恢复服务..."
# 重启Keepalived
systemctl restart keepalived
sleep 5
# 重启Nginx
systemctl restart nginx
sleep 3
echo "恢复操作完成"
fi
echo ""
echo "=== 监控完成 ==="
}
# 执行主函数
main
设置定时监控
bash
# 添加定时任务
echo "*/5 * * * * /usr/local/keepalived/scripts/monitor_keepalived.sh" | crontab -
# 查看定时任务
crontab -l
# 重启crond服务
systemctl restart crond
5.2 日志管理
Keepalived日志配置
bash
# 创建日志目录
mkdir -p /var/log/keepalived
chown root:root /var/log/keepalived
chmod 755 /var/log/keepalived
# 配置rsyslog
cat >> /etc/rsyslog.d/keepalived.conf << EOF
# Keepalived日志配置
local0.* /var/log/keepalived/keepalived.log
& stop
EOF
# 重启rsyslog服务
systemctl restart rsyslog
# 配置logrotate
cat > /etc/logrotate.d/keepalived << EOF
/var/log/keepalived/*.log {
daily
missingok
rotate 30
compress
delaycompress
notifempty
create 644 root root
postrotate
systemctl reload keepalived
endscript
}
EOF
# 测试logrotate
logrotate -f /etc/logrotate.d/keepalived
日志分析脚本
bash
#!/bin/bash
# /usr/local/keepalived/scripts/analyze_logs.sh
# Keepalived日志分析脚本
# 用法:./analyze_logs.sh
# 配置参数
LOG_DIR="/var/log/keepalived"
REPORT_FILE="/tmp/keepalived_report.txt"
ALERT_EMAIL="admin@example.com"
# 分析状态转换
analyze_state_transitions() {
echo "=== 状态转换分析 ===" >> $REPORT_FILE
echo "分析时间: $(date)" >> $REPORT_FILE
# 查找状态转换记录
grep -n "Transition to MASTER\|Transition to BACKUP\|Transition to FAULT" $LOG_DIR/keepalived.log | tail -10 >> $REPORT_FILE
echo "" >> $REPORT_FILE
}
# 分析故障事件
analyze_fault_events() {
echo "=== 故障事件分析 ===" >> $REPORT_FILE
# 查找故障相关记录
grep -n "Fault\|ERROR\|Failed" $LOG_DIR/keepalived.log | tail -10 >> $REPORT_FILE
echo "" >> $REPORT_FILE
}
# 分析VRRP通信
analyze_vrrp_communication() {
echo "=== VRRP通信分析 ===" >> $REPORT_FILE
# 查找VRRP通信记录
grep -n "VRRP_Script\|VRRP_Instance\|Received\|Sending" $LOG_DIR/keepalived.log | tail -10 >> $REPORT_FILE
echo "" >> $REPORT_FILE
}
# 分析健康检查
analyze_health_checks() {
echo "=== 健康检查分析 ===" >> $REPORT_FILE
# 查找健康检查记录
grep -n "check_nginx\|Health check\|Script" $LOG_DIR/keepalived.log | tail -10 >> $REPORT_FILE
echo "" >> $REPORT_FILE
}
# 生成统计报告
generate_statistics() {
echo "=== 统计报告 ===" >> $REPORT_FILE
# 统计状态转换次数
master_transitions=$(grep -c "Transition to MASTER" $LOG_DIR/keepalived.log)
backup_transitions=$(grep -c "Transition to BACKUP" $LOG_DIR/keepalived.log)
fault_transitions=$(grep -c "Transition to FAULT" $LOG_DIR/keepalived.log)
echo "状态转换统计:" >> $REPORT_FILE
echo " MASTER转换次数: $master_transitions" >> $REPORT_FILE
echo " BACKUP转换次数: $backup_transitions" >> $REPORT_FILE
echo " FAULT转换次数: $fault_transitions" >> $REPORT_FILE
# 统计错误次数
error_count=$(grep -c "ERROR\|Failed" $LOG_DIR/keepalived.log)
echo " 错误次数: $error_count" >> $REPORT_FILE
echo "" >> $REPORT_FILE
}
# 检查异常情况
check_abnormalities() {
echo "=== 异常情况检查 ===" >> $REPORT_FILE
# 检查频繁的状态转换
recent_transitions=$(grep -c "Transition to" $LOG_DIR/keepalived.log | tail -100)
if [ "$recent_transitions" -gt 10 ]; then
echo "警告: 最近100条日志中状态转换次数过多 ($recent_transitions)" >> $REPORT_FILE
fi
# 检查频繁的错误
recent_errors=$(grep -c "ERROR\|Failed" $LOG_DIR/keepalived.log | tail -100)
if [ "$recent_errors" -gt 5 ]; then
echo "警告: 最近100条日志中错误次数过多 ($recent_errors)" >> $REPORT_FILE
fi
echo "" >> $REPORT_FILE
}
# 发送报告
send_report() {
local subject="Keepalived日志分析报告 - $(date '+%Y-%m-%d %H:%M:%S')"
# 发送邮件报告
cat $REPORT_FILE | mail -s "$subject" $ALERT_EMAIL
echo "报告已发送到 $ALERT_EMAIL"
}
# 主函数
main() {
echo "开始分析Keepalived日志..."
# 清空报告文件
> $REPORT_FILE
# 执行分析
analyze_state_transitions
analyze_fault_events
analyze_vrrp_communication
analyze_health_checks
generate_statistics
check_abnormalities
# 显示报告摘要
echo "分析完成,报告摘要:"
echo "================================"
cat $REPORT_FILE
echo "================================"
# 发送报告
send_report
echo "日志分析完成"
}
# 执行主函数
main
5.3 维护操作
日常维护检查清单
bash
#!/bin/bash
# /usr/local/keepalived/scripts/daily_check.sh
# Keepalived日常维护检查脚本
# 用法:./daily_check.sh
# 配置参数
VIP="192.168.1.100"
MASTER_IP="192.168.1.10"
BACKUP_IP="192.168.1.11"
LOG_FILE="/var/log/keepalived/daily_check.log"
# 记录检查时间
echo "=== Keepalived日常检查 ===" > $LOG_FILE
echo "检查时间: $(date)" >> $LOG_FILE
echo "" >> $LOG_FILE
# 检查主服务器状态
check_master_server() {
echo "1. 检查主服务器状态" >> $LOG_FILE
# 检查Keepalived服务
local keepalived_status=$(ssh root@$MASTER_IP "systemctl is-active keepalived")
echo " Keepalived服务状态: $keepalived_status" >> $LOG_FILE
# 检查Nginx服务
local nginx_status=$(ssh root@$MASTER_IP "systemctl is-active nginx")
echo " Nginx服务状态: $nginx_status" >> $LOG_FILE
# 检查虚拟IP
local vip_bound=$(ssh root@$MASTER_IP "ip addr show | grep -c '$VIP'")
if [ "$vip_bound" -gt 0 ]; then
echo " 虚拟IP状态: 已绑定" >> $LOG_FILE
else
echo " 虚拟IP状态: 未绑定" >> $LOG_FILE
fi
# 检查系统资源
local cpu_usage=$(ssh root@$MASTER_IP "top -bn1 | grep 'Cpu(s)' | awk '{print \$2}' | cut -d'%' -f1")
local memory_usage=$(ssh root@$MASTER_IP "free -m | grep 'Mem:' | awk '{printf \"%.2f\", \$3/\$2*100}'")
local disk_usage=$(ssh root@$MASTER_IP "df -h / | awk 'NR==2 {print \$5}' | cut -d'%' -f1")
echo " CPU使用率: ${cpu_usage}%" >> $LOG_FILE
echo " 内存使用率: ${memory_usage}%" >> $LOG_FILE
echo " 磁盘使用率: ${disk_usage}%" >> $LOG_FILE
echo "" >> $LOG_FILE
}
# 检查备用服务器状态
check_backup_server() {
echo "2. 检查备用服务器状态" >> $LOG_FILE
# 检查Keepalived服务
local keepalived_status=$(ssh root@$BACKUP_IP "systemctl is-active keepalived")
echo " Keepalived服务状态: $keepalived_status" >> $LOG_FILE
# 检查Nginx服务
local nginx_status=$(ssh root@$BACKUP_IP "systemctl is-active nginx")
echo " Nginx服务状态: $nginx_status" >> $LOG_FILE
# 检查虚拟IP
local vip_bound=$(ssh root@$BACKUP_IP "ip addr show | grep -c '$VIP'")
if [ "$vip_bound" -gt 0 ]; then
echo " 虚拟IP状态: 已绑定" >> $LOG_FILE
else
echo " 虚拟IP状态: 未绑定" >> $LOG_FILE
fi
# 检查系统资源
local cpu_usage=$(ssh root@$BACKUP_IP "top -bn1 | grep 'Cpu(s)' | awk '{print \$2}' | cut -d'%' -f1")
local memory_usage=$(ssh root@$BACKUP_IP "free -m | grep 'Mem:' | awk '{printf \"%.2f\", \$3/\$2*100}'")
local disk_usage=$(ssh root@$BACKUP_IP "df -h / | awk 'NR==2 {print \$5}' | cut -d'%' -f1")
echo " CPU使用率: ${cpu_usage}%" >> $LOG_FILE
echo " 内存使用率: ${memory_usage}%" >> $LOG_FILE
echo " 磁盘使用率: ${disk_usage}%" >> $LOG_FILE
echo "" >> $LOG_FILE
}
# 检查网络连通性
check_network_connectivity() {
echo "3. 检查网络连通性" >> $LOG_FILE
# 测试主服务器连通性
if ping -c 1 -W 1 $MASTER_IP > /dev/null 2>&1; then
echo " 主服务器连通性: 正常" >> $LOG_FILE
else
echo " 主服务器连通性: 失败" >> $LOG_FILE
fi
# 测试备用服务器连通性
if ping -c 1 -W 1 $BACKUP_IP > /dev/null 2>&1; then
echo " 备用服务器连通性: 正常" >> $LOG_FILE
else
echo " 备用服务器连通性: 失败" >> $LOG_FILE
fi
# 测试虚拟IP连通性
if ping -c 1 -W 1 $VIP > /dev/null 2>&1; then
echo " 虚拟IP连通性: 正常" >> $LOG_FILE
else
echo " 虚拟IP连通性: 失败" >> $LOG_FILE
fi
echo "" >> $LOG_FILE
}
# 检查服务可用性
check_service_availability() {
echo "4. 检查服务可用性" >> $LOG_FILE
# 测试HTTP服务
local http_code=$(curl -o /dev/null -s -w "%{http_code}" http://$VIP)
echo " HTTP服务状态: $http_code" >> $LOG_FILE
# 测试Nginx状态页面
local status_code=$(curl -o /dev/null -s -w "%{http_code}" http://$VIP/status)
echo " 状态页面状态: $status_code" >> $LOG_FILE
echo "" >> $LOG_FILE
}
# 检查日志文件
check_log_files() {
echo "5. 检查日志文件" >> $LOG_FILE
# 检查Keepalived日志大小
local log_size=$(du -m /var/log/keepalived/keepalived.log | cut -f1)
echo " Keepalived日志大小: ${log_size}MB" >> $LOG_FILE
# 检查最近的错误
local recent_errors=$(grep -c "ERROR\|Failed" /var/log/keepalived/keepalived.log | tail -100)
echo " 最近错误数量: $recent_errors" >> $LOG_FILE
# 检查最近的状态转换
local recent_transitions=$(grep -c "Transition to" /var/log/keepalived/keepalived.log | tail -100)
echo " 最近状态转换数量: $recent_transitions" >> $LOG_FILE
echo "" >> $LOG_FILE
}
# 生成检查报告
generate_report() {
echo "6. 检查报告" >> $LOG_FILE
# 检查是否需要告警
local alert_needed=0
# 检查服务状态
local master_keepalived=$(ssh root@$MASTER_IP "systemctl is-active keepalived")
local master_nginx=$(ssh root@$MASTER_IP "systemctl is-active nginx")
local backup_keepalived=$(ssh root@$BACKUP_IP "systemctl is-active keepalived")
local backup_nginx=$(ssh root@$BACKUP_IP "systemctl is-active nginx")
if [ "$master_keepalived" != "active" ] || [ "$master_nginx" != "active" ]; then
echo " 警告: 主服务器服务异常" >> $LOG_FILE
alert_needed=1
fi
if [ "$backup_keepalived" != "active" ] || [ "$backup_nginx" != "active" ]; then
echo " 警告: 备用服务器服务异常" >> $LOG_FILE
alert_needed=1
fi
# 检查资源使用率
local master_cpu=$(ssh root@$MASTER_IP "top -bn1 | grep 'Cpu(s)' | awk '{print \$2}' | cut -d'%' -f1")
local master_memory=$(ssh root@$MASTER_IP "free -m | grep 'Mem:' | awk '{printf \"%.2f\", \$3/\$2*100}'")
local backup_cpu=$(ssh root@$BACKUP_IP "top -bn1 | grep 'Cpu(s)' | awk '{print \$2}' | cut -d'%' -f1")
local backup_memory=$(ssh root@$BACKUP_IP "free -m | grep 'Mem:' | awk '{printf \"%.2f\", \$3/\$2*100}'")
if (( $(echo "$master_cpu > 80" | bc -l) )); then
echo " 警告: 主服务器CPU使用率过高 (${master_cpu}%)" >> $LOG_FILE
alert_needed=1
fi
if (( $(echo "$master_memory > 80" | bc -l) )); then
echo " 警告: 主服务器内存使用率过高 (${master_memory}%)" >> $LOG_FILE
alert_needed=1
fi
if (( $(echo "$backup_cpu > 80" | bc -l) )); then
echo " 警告: 备用服务器CPU使用率过高 (${backup_cpu}%)" >> $LOG_FILE
alert_needed=1
fi
if (( $(echo "$backup_memory > 80" | bc -l) )); then
echo " 警告: 备用服务器内存使用率过高 (${backup_memory}%)" >> $LOG_FILE
alert_needed=1
fi
if [ "$alert_needed" -eq 1 ]; then
echo " 建议: 需要关注系统状态" >> $LOG_FILE
else
echo " 状态: 所有检查项目正常" >> $LOG_FILE
fi
echo "" >> $LOG_FILE
}
# 主函数
main() {
echo "开始Keepalived日常检查..."
# 执行各项检查
check_master_server
check_backup_server
check_network_connectivity
check_service_availability
check_log_files
generate_report
# 显示检查结果
echo "日常检查完成,详细报告请查看: $LOG_FILE"
echo "================================"
tail -20 $LOG_FILE
echo "================================"
}
# 执行主函数
main
设置定时检查
bash
# 添加每日检查任务
echo "0 9 * * * /usr/local/keepalived/scripts/daily_check.sh" | crontab -
# 添加每周日志分析任务
echo "0 0 * * 0 /usr/local/keepalived/scripts/analyze_logs.sh" | crontab -
# 查看定时任务
crontab -l
六、实战案例
6.1 电商网站高可用部署
架构设计
+-----------------+
| 负载均衡器 |
| (Keepalived+VIP)|
+--------+--------+
|
+-------------------+-------------------+
| | |
+---------v---------+ +---------v---------+ +---------v---------+
| Web服务器1 | | Web服务器2 | | Web服务器3 |
| (Nginx+Keepalived)| | (Nginx+Keepalived)| | (Nginx+Keepalived)|
+-------------------+ +-------------------+ +-------------------+
| | |
+-------------------+-------------------+
|
+--------+--------+
| 应用服务器集群 |
+-----------------+
配置文件示例
主服务器配置:
nginx
# /etc/keepalived/keepalived.conf
global_defs {
router_id nginx-master
notification_email {
admin@example.com
ops@example.com
}
notification_email_from keepalived@example.com
smtp_server 127.0.0.1
smtp_connect_timeout 30
script_user root
enable_script_security
}
vrrp_script check_nginx {
script "/usr/local/keepalived/scripts/check_nginx.sh"
interval 2
timeout 2
fall 3
rise 2
weight -5
user root
group root
}
vrrp_script check_app {
script "/usr/local/keepalived/scripts/check_app.sh"
interval 5
timeout 2
fall 3
rise 2
weight -3
user root
group root
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass ecommerce2024
}
virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:0
}
track_script {
check_nginx
check_app
}
notify_master "/usr/local/keepalived/scripts/notify.sh MASTER 100"
notify_backup "/usr/local/keepalived/scripts/notify.sh BACKUP 100"
notify_fault "/usr/local/keepalived/scripts/notify.sh FAULT 100"
preempt_delay 300
}
vrrp_instance VI_2 {
state BACKUP
interface eth0
virtual_router_id 52
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass ecommerce2024
}
virtual_ipaddress {
192.168.1.101/24 dev eth0 label eth0:1
}
track_script {
check_nginx
check_app
}
notify_master "/usr/local/keepalived/scripts/notify.sh MASTER 90"
notify_backup "/usr/local/keepalived/scripts/notify.sh BACKUP 90"
notify_fault "/usr/local/keepalived/scripts/notify.sh FAULT 90"
}
应用服务检查脚本:
bash
#!/bin/bash
# /usr/local/keepalived/scripts/check_app.sh
# 检查应用服务端口
if ! netstat -tlnp | grep -q ":8080"; then
echo "Application port 8080 not listening"
exit 1
fi
# 检查应用服务健康检查接口
if ! curl -f http://localhost:8080/health > /dev/null 2>&1; then
echo "Application health check failed"
exit 1
fi
# 检查数据库连接
if ! curl -f http://localhost:8080/db-check > /dev/null 2>&1; then
echo "Database connection check failed"
exit 1
fi
# 检查缓存服务
if ! curl -f http://localhost:8080/cache-check > /dev/null 2>&1; then
echo "Cache service check failed"
exit 1
fi
echo "Application health check passed"
exit 0
6.2 微服务架构高可用部署
架构设计
+-----------------+
| API网关 |
| (Keepalived+VIP)|
+--------+--------+
|
+-------------------+-------------------+
| | |
+---------v---------+ +---------v---------+ +---------v---------+
| API网关1 | | API网关2 | | API网关3 |
| (Nginx+Keepalived)| | (Nginx+Keepalived)| | (Nginx+Keepalived)|
+-------------------+ +-------------------+ +-------------------+
| | |
+-------------------+-------------------+
|
+--------+--------+
| 服务注册中心 |
+-----------------+
|
+-------------------+-------------------+
| | |
+---------v---------+ +---------v---------+ +---------v---------+
| 微服务集群1 | | 微服务集群2 | | 微服务集群3 |
+-------------------+ +-------------------+ +-------------------+
配置文件示例
API网关配置:
nginx
# /etc/keepalived/keepalived.conf
global_defs {
router_id api-gateway-master
notification_email {
admin@example.com
ops@example.com
}
notification_email_from keepalived@example.com
smtp_server 127.0.0.1
smtp_connect_timeout 30
script_user root
enable_script_security
}
vrrp_script check_nginx {
script "/usr/local/keepalived/scripts/check_nginx.sh"
interval 2
timeout 2
fall 3
rise 2
weight -5
user root
group root
}
vrrp_script check_services {
script "/usr/local/keepalived/scripts/check_services.sh"
interval 5
timeout 2
fall 3
rise 2
weight -3
user root
group root
}
vrrp_script check_discovery {
script "/usr/local/keepalived/scripts/check_discovery.sh"
interval 10
timeout 5
fall 2
rise 2
weight -2
user root
group root
}
vrrp_instance VI_API {
state MASTER
interface eth0
virtual_router_id 100
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass microservice2024
}
virtual_ipaddress {
192.168.1.200/24 dev eth0 label eth0:0
}
track_script {
check_nginx
check_services
check_discovery
}
notify_master "/usr/local/keepalived/scripts/notify.sh MASTER 100"
notify_backup "/usr/local/keepalived/scripts/notify.sh BACKUP 100"
notify_fault "/usr/local/keepalived/scripts/notify.sh FAULT 100"
preempt_delay 300
}
vrrp_instance VI_INTERNAL {
state BACKUP
interface eth1
virtual_router_id 101
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass microservice2024
}
virtual_ipaddress {
10.0.0.100/24 dev eth1 label eth1:0
}
track_script {
check_nginx
check_services
}
notify_master "/usr/local/keepalived/scripts/notify.sh MASTER 90"
notify_backup "/usr/local/keepalived/scripts/notify.sh BACKUP 90"
notify_fault "/usr/local/keepalived/scripts/notify.sh FAULT 90"
}
服务检查脚本:
bash
#!/bin/bash
# /usr/local/keepalived/scripts/check_services.sh
# 检查关键微服务
services=("user-service" "product-service" "order-service" "payment-service")
for service in "${services[@]}"; do
# 检查服务端口
if ! netstat -tlnp | grep -q ":808${service#*-service}"; then
echo "Service $service port not listening"
exit 1
fi
# 检查服务健康状态
if ! curl -f http://localhost:808${service#*-service}/health > /dev/null 2>&1; then
echo "Service $service health check failed"
exit 1
fi
done
# 检查服务注册中心
if ! curl -f http://localhost:8761/eureka/apps > /dev/null 2>&1; then
echo "Service registry check failed"
exit 1
fi
# 检查配置中心
if ! curl -f http://localhost:8888/actuator/health > /dev/null 2>&1; then
echo "Config server check failed"
exit 1
fi
echo "All services health check passed"
exit 0
性能优化建议:
- 网络优化:优化网络配置,减少网络延迟
- 资源管理:合理分配系统资源,避免资源瓶颈
- 参数调优:根据实际情况调整Keepalived参数
- 负载均衡:结合负载均衡技术,提高系统处理能力
安全配置建议:
- 访问控制:限制Keepalived管理接口的访问权限
- 认证配置:配置VRRP认证,防止非法接入
- 日志安全:保护日志文件,防止敏感信息泄露
- 网络安全:配置防火墙规则,保护系统安全
Keepalived+双机热备方案是企业级高可用架构的基础,通过合理的配置和管理,可以构建稳定可靠的系统环境。在实际应用中,还需要结合具体的业务需求和技术架构,选择合适的部署方案和配置策略。
高可用架构是一个持续优化的过程,需要不断地监控、测试和改进。希望本文能够为你提供有价值的参考,帮助大家构建更加稳定可靠的系统架构。