Nginx高可用方案实战:Keepalived+双机热备部署

系列文章索引:

前言

在生产环境中,单点故障是系统架构的大敌。即使是性能再好的Nginx服务器,一旦宕机也会导致整个服务不可用。为了确保业务连续性和高可用性,我们需要构建高可用架构,实现故障自动转移和服务无缝切换。

本文将详细介绍如何使用Keepalived+双机热备方案构建Nginx高可用集群,通过实际案例演示从架构设计到部署配置的完整过程,帮助你掌握企业级高可用解决方案的搭建技能。

一、高可用基础概念

1.1 什么是高可用性

高可用性(High Availability,简称HA)是指系统通过设计减少系统停机时间,保持服务持续可用的能力。通常用"几个9"来衡量可用性:

可用性等级:

  • 99%:年停机时间约3.65天
  • 99.9%:年停机时间约8.76小时
  • 99.99%:年停机时间约52.6分钟
  • 99.999%:年停机时间约5.26分钟

高可用性核心要素:

  • 冗余设计:关键组件冗余部署
  • 故障检测:实时监控系统状态
  • 故障转移:自动切换到备用系统
  • 数据一致性:确保数据同步和一致性

1.2 高可用架构模式

主备模式(Active-Standby)
复制代码
客户端 → 虚拟IP → 主服务器(Active)
                ↓
            备用服务器(Standby)

特点:

  • 一台主服务器提供服务
  • 一台或多台备用服务器待命
  • 主服务器故障时自动切换到备用服务器
  • 资源利用率较低,但实现简单
双活模式(Active-Active)
复制代码
客户端 → 负载均衡器 → 服务器1(Active)
                    ↓
                服务器2(Active)

特点:

  • 多台服务器同时提供服务
  • 负载均衡分配请求
  • 单台服务器故障时自动剔除
  • 资源利用率高,但配置复杂
集群模式(Cluster)
复制代码
客户端 → 集群管理器 → 节点1
                    ↓
                节点2
                    ↓
                节点3

特点:

  • 多个节点组成集群
  • 统一的集群管理
  • 动态扩展和收缩
  • 适用于大规模系统

1.3 Keepalived简介

Keepalived概述

Keepalived是一个基于VRRP协议的高可用解决方案,主要用于:

核心功能:

  • 健康检查:监控服务器状态
  • 故障检测:及时发现系统故障
  • 故障转移:自动切换到备用服务器
  • 虚拟IP管理:管理虚拟IP地址

工作原理:

  • 基于VRRP(Virtual Router Redundancy Protocol)协议
  • 通过多播通信选举主服务器
  • 主服务器定期发送心跳包
  • 备用服务器监听心跳包
  • 心跳超时后触发故障转移
Keepalived组件

核心组件:

  • vrrpd:VRRP协议实现
  • checkers:健康检查模块
  • IPVS wrapper:IPVS包装器
  • Netlink Reflector:网络接口管理

配置文件:

  • /etc/keepalived/keepalived.conf:主配置文件
  • /etc/sysconfig/keepalived:启动参数配置

二、Keepalived安装与配置

2.1 系统环境准备

服务器规划
角色 主机名 IP地址 虚拟IP 系统版本
主服务器 nginx-master 192.168.1.10 192.168.1.100 CentOS 7
备用服务器 nginx-backup 192.168.1.11 192.168.1.100 CentOS 7
系统初始化

主服务器初始化:

bash 复制代码
#!/bin/bash
# 主服务器系统初始化脚本

# 设置主机名
hostnamectl set-hostname nginx-master

# 关闭防火墙和SELinux
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

# 配置hosts文件
cat >> /etc/hosts << EOF
192.168.1.10 nginx-master
192.168.1.11 nginx-backup
192.168.1.100 nginx-vip
EOF

# 同步时间
yum install -y ntpdate
ntpdate -u cn.pool.ntp.org
systemctl enable ntpd
systemctl start ntpd

# 安装基础工具
yum install -y wget curl vim net-tools telnet

# 配置内核参数
cat >> /etc/sysctl.conf << EOF
# 允许绑定非本地IP地址
net.ipv4.ip_nonlocal_bind = 1

# 开启IP转发
net.ipv4.ip_forward = 1

# 关闭ICMP重定向
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.eth0.send_redirects = 0

# 开启ARP代理
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
EOF

# 应用内核参数
sysctl -p

echo "主服务器初始化完成"

备用服务器初始化:

bash 复制代码
#!/bin/bash
# 备用服务器系统初始化脚本

# 设置主机名
hostnamectl set-hostname nginx-backup

# 关闭防火墙和SELinux
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

# 配置hosts文件
cat >> /etc/hosts << EOF
192.168.1.10 nginx-master
192.168.1.11 nginx-backup
192.168.1.100 nginx-vip
EOF

# 同步时间
yum install -y ntpdate
ntpdate -u cn.pool.ntp.org
systemctl enable ntpd
systemctl start ntpd

# 安装基础工具
yum install -y wget curl vim net-tools telnet

# 配置内核参数
cat >> /etc/sysctl.conf << EOF
# 允许绑定非本地IP地址
net.ipv4.ip_nonlocal_bind = 1

# 开启IP转发
net.ipv4.ip_forward = 1

# 关闭ICMP重定向
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.eth0.send_redirects = 0

# 开启ARP代理
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
EOF

# 应用内核参数
sysctl -p

echo "备用服务器初始化完成"

2.2 Nginx安装与配置

主服务器Nginx安装
bash 复制代码
#!/bin/bash
# 主服务器Nginx安装脚本

# 安装依赖
yum install -y gcc gcc-c++ make cmake autoconf automake pcre pcre-devel zlib zlib-devel openssl openssl-devel

# 创建用户和组
groupadd nginx
useradd -r -g nginx -s /sbin/nologin nginx

# 下载Nginx源码
cd /usr/local/src
wget http://nginx.org/download/nginx-1.24.0.tar.gz
tar -zxvf nginx-1.24.0.tar.gz
cd nginx-1.24.0

# 编译安装
./configure \
--prefix=/usr/local/nginx \
--user=nginx \
--group=nginx \
--with-http_ssl_module \
--with-http_stub_status_module \
--with-http_realip_module \
--with-http_gzip_static_module \
--with-pcre \
--with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' \
--with-ld-opt='-Wl,-z,relro -Wl,-z,now -pie'

make && make install

# 创建必要目录
mkdir -p /var/log/nginx
mkdir -p /var/cache/nginx
chown -R nginx:nginx /var/log/nginx
chown -R nginx:nginx /var/cache/nginx

# 创建systemd服务文件
cat > /usr/lib/systemd/system/nginx.service << 'EOF'
[Unit]
Description=The nginx HTTP and reverse proxy server
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
PIDFile=/var/run/nginx.pid
ExecStartPre=/usr/local/nginx/sbin/nginx -t
ExecStart=/usr/local/nginx/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true

[Install]
WantedBy=multi-user.target
EOF

# 重新加载systemd
systemctl daemon-reload

# 启动Nginx
systemctl start nginx
systemctl enable nginx

# 验证安装
nginx -v
systemctl status nginx

echo "主服务器Nginx安装完成"
备用服务器Nginx安装
bash 复制代码
#!/bin/bash
# 备用服务器Nginx安装脚本

# 安装依赖
yum install -y gcc gcc-c++ make cmake autoconf automake pcre pcre-devel zlib zlib-devel openssl openssl-devel

# 创建用户和组
groupadd nginx
useradd -r -g nginx -s /sbin/nologin nginx

# 下载Nginx源码
cd /usr/local/src
wget http://nginx.org/download/nginx-1.24.0.tar.gz
tar -zxvf nginx-1.24.0.tar.gz
cd nginx-1.24.0

# 编译安装
./configure \
--prefix=/usr/local/nginx \
--user=nginx \
--group=nginx \
--with-http_ssl_module \
--with-http_stub_status_module \
--with-http_realip_module \
--with-http_gzip_static_module \
--with-pcre \
--with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' \
--with-ld-opt='-Wl,-z,relro -Wl,-z,now -pie'

make && make install

# 创建必要目录
mkdir -p /var/log/nginx
mkdir -p /var/cache/nginx
chown -R nginx:nginx /var/log/nginx
chown -R nginx:nginx /var/cache/nginx

# 创建systemd服务文件
cat > /usr/lib/systemd/system/nginx.service << 'EOF'
[Unit]
Description=The nginx HTTP and reverse proxy server
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
PIDFile=/var/run/nginx.pid
ExecStartPre=/usr/local/nginx/sbin/nginx -t
ExecStart=/usr/local/nginx/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true

[Install]
WantedBy=multi-user.target
EOF

# 重新加载systemd
systemctl daemon-reload

# 启动Nginx
systemctl start nginx
systemctl enable nginx

# 验证安装
nginx -v
systemctl status nginx

echo "备用服务器Nginx安装完成"
Nginx配置同步

主服务器Nginx配置:

nginx 复制代码
# /usr/local/nginx/conf/nginx.conf

user nginx nginx;
worker_processes auto;
worker_cpu_affinity auto;
worker_rlimit_nofile 65535;

error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;

events {
    worker_connections 65535;
    use epoll;
    multi_accept on;
}

http {
    include /usr/local/nginx/conf/mime.types;
    default_type application/octet-stream;

    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';

    access_log /var/log/nginx/access.log main;

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 1000;

    gzip on;
    gzip_min_length 1k;
    gzip_buffers 4 16k;
    gzip_http_version 1.1;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
    gzip_vary on;

    server {
        listen 80;
        server_name localhost;
        
        location / {
            root /usr/local/nginx/html;
            index index.html index.htm;
        }

        location /status {
            stub_status on;
            access_log off;
            allow 127.0.0.1;
            allow 192.168.1.0/24;
            deny all;
        }

        error_page 500 502 503 504 /50x.html;
        location = /50x.html {
            root /usr/local/nginx/html;
        }
    }
}

同步配置到备用服务器:

bash 复制代码
# 在主服务器上执行
scp /usr/local/nginx/conf/nginx.conf root@192.168.1.11:/usr/local/nginx/conf/nginx.conf

# 在备用服务器上重新加载配置
systemctl reload nginx

2.3 Keepalived安装与配置

Keepalived安装

主服务器Keepalived安装:

bash 复制代码
#!/bin/bash
# 主服务器Keepalived安装脚本

# 安装Keepalived
yum install -y keepalived

# 备份原始配置文件
cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak

# 创建健康检查脚本目录
mkdir -p /usr/local/keepalived/scripts
chown -R root:root /usr/local/keepalived
chmod -R 755 /usr/local/keepalived

echo "主服务器Keepalived安装完成"

备用服务器Keepalived安装:

bash 复制代码
#!/bin/bash
# 备用服务器Keepalived安装脚本

# 安装Keepalived
yum install -y keepalived

# 备份原始配置文件
cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak

# 创建健康检查脚本目录
mkdir -p /usr/local/keepalived/scripts
chown -R root:root /usr/local/keepalived
chmod -R 755 /usr/local/keepalived

echo "备用服务器Keepalived安装完成"
健康检查脚本

Nginx健康检查脚本:

bash 复制代码
#!/bin/bash
# /usr/local/keepalived/scripts/check_nginx.sh

# 检查Nginx进程是否存在
if ! pgrep nginx > /dev/null; then
    echo "Nginx process not found"
    exit 1
fi

# 检查Nginx端口是否监听
if ! netstat -tlnp | grep -q ":80"; then
    echo "Nginx port 80 not listening"
    exit 1
fi

# 检查Nginx状态页面是否正常响应
if ! curl -f http://localhost/status > /dev/null 2>&1; then
    echo "Nginx status page not responding"
    exit 1
fi

# 检查Nginx配置文件语法
if ! /usr/local/nginx/sbin/nginx -t > /dev/null 2>&1; then
    echo "Nginx configuration test failed"
    exit 1
fi

echo "Nginx health check passed"
exit 0

通知脚本:

bash 复制代码
#!/bin/bash
# /usr/local/keepalived/scripts/notify.sh

# Keepalived通知脚本
# 用法: notify.sh <MASTER|BACKUP|FAULT> <priority>

TYPE=$1
PRIORITY=$2

# 发送邮件通知
send_notification() {
    local subject="Keepalived Alert: $TYPE on $(hostname)"
    local message="Keepalived state changed to $TYPE with priority $PRIORITY on $(hostname) at $(date)"
    
    echo "$message" | mail -s "$subject" admin@example.com
}

# 记录日志
log_event() {
    local message="Keepalived state changed to $TYPE with priority $PRIORITY"
    logger -t keepalived "$message"
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $message" >> /var/log/keepalived/notify.log
}

# 执行相应操作
case $TYPE in
    "MASTER")
        log_event
        send_notification
        # 启动Nginx(如果未运行)
        systemctl start nginx
        # 可以在这里添加其他主节点启动后的操作
        ;;
    "BACKUP")
        log_event
        send_notification
        # 可以在这里添加备用节点操作
        ;;
    "FAULT")
        log_event
        send_notification
        # 停止Nginx(如果需要)
        systemctl stop nginx
        # 可以在这里添加故障处理操作
        ;;
    *)
        echo "Unknown state: $TYPE"
        exit 1
        ;;
esac

exit 0

设置脚本权限:

bash 复制代码
# 在两台服务器上都执行
chmod +x /usr/local/keepalived/scripts/check_nginx.sh
chmod +x /usr/local/keepalived/scripts/notify.sh

# 创建日志目录
mkdir -p /var/log/keepalived
chown root:root /var/log/keepalived
chmod 755 /var/log/keepalived

三、Keepalived配置详解

3.1 主服务器Keepalived配置

主服务器配置文件:

nginx 复制代码
# /etc/keepalived/keepalived.conf

! Configuration File for keepalived

# 全局配置
global_defs {
    # 路由器标识,通常设置为主机名
    router_id nginx-master
    
    # 设置邮件通知信息
    notification_email {
        admin@example.com
        ops@example.com
    }
    
    # 发送邮件的地址
    notification_email_from keepalived@example.com
    
    # 邮件服务器地址
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    
    # 设置Keepalived进程的用户和组
    script_user root
    enable_script_security
}

# VRRP实例配置
vrrp_instance VI_1 {
    # 状态:MASTER为主服务器,BACKUP为备用服务器
    state MASTER
    
    # 网络接口名称
    interface eth0
    
    # 虚拟路由器ID,同一个集群中必须相同
    virtual_router_id 51
    
    # 优先级,数值越大优先级越高
    priority 100
    
    # VRRP通告间隔,单位秒
    advert_int 1
    
    # 认证配置
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    
    # 虚拟IP地址配置
    virtual_ipaddress {
        192.168.1.100/24 dev eth0 label eth0:0
    }
    
    # 设置非抢占模式(可选)
    # nopreempt
    
    # 设置抢占延迟(可选)
    # preempt_delay 300
    
    # 健康检查脚本配置
    track_script {
        check_nginx
    }
    
    # 状态转换通知脚本
    notify_master "/usr/local/keepalived/scripts/notify.sh MASTER 100"
    notify_backup "/usr/local/keepalived/scripts/notify.sh BACKUP 100"
    notify_fault "/usr/local/keepalived/scripts/notify.sh FAULT 100"
    
    # 设置VRRP协议版本
    vrrp_version 2
    
    # 设置VRRP协议模式
    vrrp_iptables
    
    # 设置VRRP协议严格模式
    vrrp_strict
    
    # 设置VRRP协议的组播地址
    # vrrp_mcast_group4 224.0.0.18
}

# 健康检查脚本定义
vrrp_script check_nginx {
    # 检查脚本路径
    script "/usr/local/keepalived/scripts/check_nginx.sh"
    
    # 检查间隔,单位秒
    interval 2
    
    # 超时时间,单位秒
    timeout 2
    
    # 失败阈值
    fall 3
    
    # 成功阈值
    rise 2
    
    # 检测失败后的权重调整
    weight -5
    
    # 检测脚本的运行用户
    user root
    
    # 检测脚本的运行组
    group root
    
    # 检测脚本的初始化延迟
    # init_delay 5
}

# 虚拟服务器配置(可选,用于LVS)
virtual_server 192.168.1.100 80 {
    # 延迟运行时间
    delay_loop 6
    
    # 负载均衡算法
    lb_algo rr
    
    # 负载均衡模式
    lb_kind NAT
    
    # 持久连接超时时间
    persistence_timeout 50
    
    # 协议
    protocol TCP
    
    # 真实服务器配置
    real_server 192.168.1.10 80 {
        # 权重
        weight 1
        
        # 健康检查方法
        TCP_CHECK {
            # 连接超时时间
            connect_timeout 3
            
            # 重试次数
            nb_get_retry 3
            
            # 重试延迟
            delay_before_retry 3
            
            # 连接端口
            connect_port 80
        }
    }
    
    real_server 192.168.1.11 80 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
            connect_port 80
        }
    }
}

3.2 备用服务器Keepalived配置

备用服务器配置文件:

nginx 复制代码
# /etc/keepalived/keepalived.conf

! Configuration File for keepalived

# 全局配置
global_defs {
    # 路由器标识,通常设置为主机名
    router_id nginx-backup
    
    # 设置邮件通知信息
    notification_email {
        admin@example.com
        ops@example.com
    }
    
    # 发送邮件的地址
    notification_email_from keepalived@example.com
    
    # 邮件服务器地址
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    
    # 设置Keepalived进程的用户和组
    script_user root
    enable_script_security
}

# VRRP实例配置
vrrp_instance VI_1 {
    # 状态:MASTER为主服务器,BACKUP为备用服务器
    state BACKUP
    
    # 网络接口名称
    interface eth0
    
    # 虚拟路由器ID,同一个集群中必须相同
    virtual_router_id 51
    
    # 优先级,数值越大优先级越高
    priority 90
    
    # VRRP通告间隔,单位秒
    advert_int 1
    
    # 认证配置
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    
    # 虚拟IP地址配置
    virtual_ipaddress {
        192.168.1.100/24 dev eth0 label eth0:0
    }
    
    # 设置非抢占模式(可选)
    # nopreempt
    
    # 设置抢占延迟(可选)
    # preempt_delay 300
    
    # 健康检查脚本配置
    track_script {
        check_nginx
    }
    
    # 状态转换通知脚本
    notify_master "/usr/local/keepalived/scripts/notify.sh MASTER 90"
    notify_backup "/usr/local/keepalived/scripts/notify.sh BACKUP 90"
    notify_fault "/usr/local/keepalived/scripts/notify.sh FAULT 90"
    
    # 设置VRRP协议版本
    vrrp_version 2
    
    # 设置VRRP协议模式
    vrrp_iptables
    
    # 设置VRRP协议严格模式
    vrrp_strict
    
    # 设置VRRP协议的组播地址
    # vrrp_mcast_group4 224.0.0.18
}

# 健康检查脚本定义
vrrp_script check_nginx {
    # 检查脚本路径
    script "/usr/local/keepalived/scripts/check_nginx.sh"
    
    # 检查间隔,单位秒
    interval 2
    
    # 超时时间,单位秒
    timeout 2
    
    # 失败阈值
    fall 3
    
    # 成功阈值
    rise 2
    
    # 检测失败后的权重调整
    weight -5
    
    # 检测脚本的运行用户
    user root
    
    # 检测脚本的运行组
    group root
    
    # 检测脚本的初始化延迟
    # init_delay 5
}

# 虚拟服务器配置(可选,用于LVS)
virtual_server 192.168.1.100 80 {
    # 延迟运行时间
    delay_loop 6
    
    # 负载均衡算法
    lb_algo rr
    
    # 负载均衡模式
    lb_kind NAT
    
    # 持久连接超时时间
    persistence_timeout 50
    
    # 协议
    protocol TCP
    
    # 真实服务器配置
    real_server 192.168.1.10 80 {
        # 权重
        weight 1
        
        # 健康检查方法
        TCP_CHECK {
            # 连接超时时间
            connect_timeout 3
            
            # 重试次数
            nb_get_retry 3
            
            # 重试延迟
            delay_before_retry 3
            
            # 连接端口
            connect_port 80
        }
    }
    
    real_server 192.168.1.11 80 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
            connect_port 80
        }
    }
}

3.3 配置文件说明

全局配置参数
nginx 复制代码
global_defs {
    router_id nginx-master          # 路由器标识
    notification_email {           # 邮件通知配置
        admin@example.com
        ops@example.com
    }
    notification_email_from keepalived@example.com
    smtp_server 127.0.0.1         # 邮件服务器
    smtp_connect_timeout 30        # 邮件连接超时
    script_user root               # 脚本运行用户
    enable_script_security         # 启用脚本安全
}
VRRP实例配置参数
nginx 复制代码
vrrp_instance VI_1 {
    state MASTER                   # 状态:MASTER/BACKUP
    interface eth0                # 网络接口
    virtual_router_id 51          # 虚拟路由器ID
    priority 100                  # 优先级
    advert_int 1                  # 通告间隔
    authentication {              # 认证配置
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {            # 虚拟IP配置
        192.168.1.100/24 dev eth0 label eth0:0
    }
    track_script {                # 健康检查脚本
        check_nginx
    }
    notify_master "/path/to/script.sh MASTER"  # 状态转换通知
    notify_backup "/path/to/script.sh BACKUP"
    notify_fault "/path/to/script.sh FAULT"
}
健康检查脚本配置
nginx 复制代码
vrrp_script check_nginx {
    script "/usr/local/keepalived/scripts/check_nginx.sh"  # 脚本路径
    interval 2                    # 检查间隔
    timeout 2                     # 超时时间
    fall 3                        # 失败阈值
    rise 2                        # 成功阈值
    weight -5                     # 权重调整
    user root                     # 运行用户
    group root                    # 运行组
}

四、服务启动与验证

4.1 启动Keepalived服务

启动主服务器Keepalived
bash 复制代码
#!/bin/bash
# 启动主服务器Keepalived

# 检查配置文件语法
keepalived -t

if [ $? -eq 0 ]; then
    echo "Keepalived配置文件语法正确"
else
    echo "Keepalived配置文件语法错误,请检查"
    exit 1
fi

# 启动Keepalived服务
systemctl start keepalived
systemctl enable keepalived

# 检查服务状态
systemctl status keepalived

# 检查Keepalived进程
ps aux | grep keepalived

# 检查虚拟IP
ip addr show eth0

# 检查VRRP状态
keepalived -v

echo "主服务器Keepalived启动完成"
启动备用服务器Keepalived
bash 复制代码
#!/bin/bash
# 启动备用服务器Keepalived

# 检查配置文件语法
keepalived -t

if [ $? -eq 0 ]; then
    echo "Keepalived配置文件语法正确"
else
    echo "Keepalived配置文件语法错误,请检查"
    exit 1
fi

# 启动Keepalived服务
systemctl start keepalived
systemctl enable keepalived

# 检查服务状态
systemctl status keepalived

# 检查Keepalived进程
ps aux | grep keepalived

# 检查虚拟IP(备用服务器不应该有VIP)
ip addr show eth0

# 检查VRRP状态
keepalived -v

echo "备用服务器Keepalived启动完成"

4.2 验证高可用状态

检查主服务器状态
bash 复制代码
#!/bin/bash
# 检查主服务器状态

echo "=== 主服务器状态检查 ==="

# 检查Keepalived服务状态
echo "1. Keepalived服务状态:"
systemctl is-active keepalived

# 检查Keepalived进程
echo -e "\n2. Keepalived进程:"
ps aux | grep keepalived | grep -v grep

# 检查虚拟IP
echo -e "\n3. 虚拟IP状态:"
ip addr show eth0 | grep 192.168.1.100

# 检查VRRP状态
echo -e "\n4. VRRP状态:"
tcpdump -i eth0 -n 'host 224.0.0.18' -c 3 2>/dev/null || echo "VRRP通信正常"

# 检查Nginx状态
echo -e "\n5. Nginx服务状态:"
systemctl is-active nginx

# 检查Nginx进程
echo -e "\n6. Nginx进程:"
ps aux | grep nginx | grep -v grep

# 检查Nginx监听端口
echo -e "\n7. Nginx监听端口:"
netstat -tlnp | grep :80

# 测试Nginx响应
echo -e "\n8. Nginx响应测试:"
curl -I http://localhost

echo -e "\n=== 主服务器状态检查完成 ==="
检查备用服务器状态
bash 复制代码
#!/bin/bash
# 检查备用服务器状态

echo "=== 备用服务器状态检查 ==="

# 检查Keepalived服务状态
echo "1. Keepalived服务状态:"
systemctl is-active keepalived

# 检查Keepalived进程
echo -e "\n2. Keepalived进程:"
ps aux | grep keepalived | grep -v grep

# 检查虚拟IP(备用服务器不应该有VIP)
echo -e "\n3. 虚拟IP状态:"
ip addr show eth0 | grep 192.168.1.100 || echo "虚拟IP未绑定(正常)"

# 检查VRRP状态
echo -e "\n4. VRRP状态:"
tcpdump -i eth0 -n 'host 224.0.0.18' -c 3 2>/dev/null || echo "VRRP通信正常"

# 检查Nginx状态
echo -e "\n5. Nginx服务状态:"
systemctl is-active nginx

# 检查Nginx进程
echo -e "\n6. Nginx进程:"
ps aux | grep nginx | grep -v grep

# 检查Nginx监听端口
echo -e "\n7. Nginx监听端口:"
netstat -tlnp | grep :80

# 测试Nginx响应
echo -e "\n8. Nginx响应测试:"
curl -I http://localhost

echo -e "\n=== 备用服务器状态检查完成 ==="
测试虚拟IP访问
bash 复制代码
#!/bin/bash
# 测试虚拟IP访问

echo "=== 虚拟IP访问测试 ==="

VIP="192.168.1.100"

# 测试HTTP访问
echo "1. 测试HTTP访问:"
curl -I http://$VIP

# 测试Nginx状态页面
echo -e "\n2. 测试Nginx状态页面:"
curl http://$VIP/status

# 测试连通性
echo -e "\n3. 测试连通性:"
ping -c 3 $VIP

# 测试端口连通性
echo -e "\n4. 测试端口连通性:"
telnet $VIP 80 << EOF
quit
EOF

echo -e "\n=== 虚拟IP访问测试完成 ==="

4.3 故障转移测试

模拟主服务器故障
bash 复制代码
#!/bin/bash
# 模拟主服务器故障测试

echo "=== 模拟主服务器故障测试 ==="

# 记录测试开始时间
START_TIME=$(date)
echo "测试开始时间:$START_TIME"

# 停止主服务器Nginx服务
echo "1. 停止主服务器Nginx服务:"
systemctl stop nginx
sleep 5

# 检查主服务器状态
echo -e "\n2. 检查主服务器状态:"
systemctl is-active nginx
systemctl is-active keepalived

# 检查虚拟IP是否已转移到备用服务器
echo -e "\n3. 检查虚拟IP转移情况:"
echo "主服务器虚拟IP:"
ssh root@192.168.1.10 "ip addr show eth0 | grep 192.168.1.100 || echo '虚拟IP已释放'"

echo "备用服务器虚拟IP:"
ssh root@192.168.1.11 "ip addr show eth0 | grep 192.168.1.100 || echo '虚拟IP未绑定'"

# 测试虚拟IP访问
echo -e "\n4. 测试虚拟IP访问:"
curl -I http://192.168.1.100

# 检查备用服务器Nginx状态
echo -e "\n5. 检查备用服务器Nginx状态:"
ssh root@192.168.1.11 "systemctl is-active nginx"

# 记录故障转移时间
END_TIME=$(date)
echo -e "\n故障转移完成时间:$END_TIME"

# 恢复主服务器
echo -e "\n6. 恢复主服务器:"
ssh root@192.168.1.10 "systemctl start nginx"
sleep 10

# 检查虚拟IP是否已恢复到主服务器
echo -e "\n7. 检查虚拟IP恢复情况:"
echo "主服务器虚拟IP:"
ssh root@192.168.1.10 "ip addr show eth0 | grep 192.168.1.100 || echo '虚拟IP未绑定'"

echo "备用服务器虚拟IP:"
ssh root@192.168.1.11 "ip addr show eth0 | grep 192.168.1.100 || echo '虚拟IP已释放'"

# 记录恢复时间
RECOVERY_TIME=$(date)
echo -e "\n恢复完成时间:$RECOVERY_TIME"

echo -e "\n=== 故障转移测试完成 ==="
模拟主服务器宕机
bash 复制代码
#!/bin/bash
# 模拟主服务器宕机测试

echo "=== 模拟主服务器宕机测试 ==="

# 记录测试开始时间
START_TIME=$(date)
echo "测试开始时间:$START_TIME"

# 关闭主服务器
echo "1. 关闭主服务器:"
ssh root@192.168.1.10 "shutdown -h now" || echo "主服务器已关闭"

# 等待故障转移
echo -e "\n2. 等待故障转移(30秒):"
sleep 30

# 检查备用服务器状态
echo -e "\n3. 检查备用服务器状态:"
ssh root@192.168.1.11 "systemctl is-active keepalived"
ssh root@192.168.1.11 "systemctl is-active nginx"

# 检查虚拟IP是否已转移到备用服务器
echo -e "\n4. 检查虚拟IP转移情况:"
ssh root@192.168.1.11 "ip addr show eth0 | grep 192.168.1.100"

# 测试虚拟IP访问
echo -e "\n5. 测试虚拟IP访问:"
curl -I http://192.168.1.100

# 记录故障转移时间
END_TIME=$(date)
echo -e "\n故障转移完成时间:$END_TIME"

# 启动主服务器(需要手动操作)
echo -e "\n6. 请手动启动主服务器,然后按回车键继续..."
read

# 等待恢复
echo -e "\n7. 等待主服务器恢复(30秒):"
sleep 30

# 检查主服务器状态
echo -e "\n8. 检查主服务器状态:"
ssh root@192.168.1.10 "systemctl is-active keepalived"
ssh root@192.168.1.10 "systemctl is-active nginx"

# 检查虚拟IP是否已恢复到主服务器
echo -e "\n9. 检查虚拟IP恢复情况:"
echo "主服务器虚拟IP:"
ssh root@192.168.1.10 "ip addr show eth0 | grep 192.168.1.100 || echo '虚拟IP未绑定'"

echo "备用服务器虚拟IP:"
ssh root@192.168.1.11 "ip addr show eth0 | grep 192.168.1.100 || echo '虚拟IP已释放'"

# 记录恢复时间
RECOVERY_TIME=$(date)
echo -e "\n恢复完成时间:$RECOVERY_TIME"

echo -e "\n=== 主服务器宕机测试完成 ==="

五、监控与维护

5.1 监控配置

Keepalived状态监控脚本
bash 复制代码
#!/bin/bash
# /usr/local/keepalived/scripts/monitor_keepalived.sh

# Keepalived监控脚本
# 用法:./monitor_keepalived.sh

# 配置参数
VIP="192.168.1.100"
MASTER_IP="192.168.1.10"
BACKUP_IP="192.168.1.11"
LOG_FILE="/var/log/keepalived/monitor.log"
ALERT_EMAIL="admin@example.com"

# 获取当前服务器角色
get_server_role() {
    local vip_bound=$(ip addr show | grep -c "$VIP")
    local keepalived_state=$(systemctl is-active keepalived)
    
    if [ "$vip_bound" -gt 0 ] && [ "$keepalived_state" = "active" ]; then
        echo "MASTER"
    elif [ "$vip_bound" -eq 0 ] && [ "$keepalived_state" = "active" ]; then
        echo "BACKUP"
    else
        echo "UNKNOWN"
    fi
}

# 检查Keepalived服务状态
check_keepalived_service() {
    local status=$(systemctl is-active keepalived)
    if [ "$status" != "active" ]; then
        echo "Keepalived service is not active: $status"
        return 1
    fi
    return 0
}

# 检查VRRP进程
check_vrrp_process() {
    local vrrp_count=$(pgrep -f "keepalived.*VRRP" | wc -l)
    if [ "$vrrp_count" -eq 0 ]; then
        echo "VRRP process not found"
        return 1
    fi
    return 0
}

# 检查虚拟IP
check_virtual_ip() {
    local vip_bound=$(ip addr show | grep -c "$VIP")
    if [ "$vip_bound" -eq 0 ]; then
        echo "Virtual IP not bound"
        return 1
    fi
    return 0
}

# 检查Nginx服务
check_nginx_service() {
    local status=$(systemctl is-active nginx)
    if [ "$status" != "active" ]; then
        echo "Nginx service is not active: $status"
        return 1
    fi
    return 0
}

# 检查网络连通性
check_network_connectivity() {
    if ! ping -c 1 -W 1 8.8.8.8 > /dev/null 2>&1; then
        echo "Network connectivity failed"
        return 1
    fi
    return 0
}

# 发送告警
send_alert() {
    local message=$1
    local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    local hostname=$(hostname)
    
    echo "[$timestamp] [$hostname] ALERT: $message" >> $LOG_FILE
    
    # 发送邮件告警
    echo "[$timestamp] [$hostname] ALERT: $message" | mail -s "Keepalived Alert: $hostname" $ALERT_EMAIL
}

# 记录监控信息
log_monitor_info() {
    local role=$(get_server_role)
    local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    local hostname=$(hostname)
    
    echo "[$timestamp] [$hostname] Role: $role, Keepalived: $(systemctl is-active keepalived), Nginx: $(systemctl is-active nginx)" >> $LOG_FILE
}

# 主监控函数
main() {
    local hostname=$(hostname)
    local role=$(get_server_role)
    
    echo "=== Keepalived Monitor ==="
    echo "Hostname: $hostname"
    echo "Role: $role"
    echo "Timestamp: $(date '+%Y-%m-%d %H:%M:%S')"
    echo ""
    
    # 检查各项指标
    local checks_passed=0
    local total_checks=5
    
    echo "1. 检查Keepalived服务状态..."
    if check_keepalived_service; then
        echo "   ✓ Keepalived service is active"
        ((checks_passed++))
    else
        echo "   ✗ Keepalived service check failed"
        send_alert "Keepalived service check failed on $hostname"
    fi
    
    echo "2. 检查VRRP进程..."
    if check_vrrp_process; then
        echo "   ✓ VRRP process is running"
        ((checks_passed++))
    else
        echo "   ✗ VRRP process check failed"
        send_alert "VRRP process check failed on $hostname"
    fi
    
    echo "3. 检查虚拟IP..."
    if [ "$role" = "MASTER" ]; then
        if check_virtual_ip; then
            echo "   ✓ Virtual IP is bound"
            ((checks_passed++))
        else
            echo "   ✗ Virtual IP check failed"
            send_alert "Virtual IP check failed on $hostname"
        fi
    else
        echo "   ✓ Virtual IP not bound (expected for BACKUP)"
        ((checks_passed++))
    fi
    
    echo "4. 检查Nginx服务状态..."
    if check_nginx_service; then
        echo "   ✓ Nginx service is active"
        ((checks_passed++))
    else
        echo "   ✗ Nginx service check failed"
        send_alert "Nginx service check failed on $hostname"
    fi
    
    echo "5. 检查网络连通性..."
    if check_network_connectivity; then
        echo "   ✓ Network connectivity is OK"
        ((checks_passed++))
    else
        echo "   ✗ Network connectivity check failed"
        send_alert "Network connectivity check failed on $hostname"
    fi
    
    echo ""
    echo "检查结果: $checks_passed/$total_checks 通过"
    
    # 记录监控信息
    log_monitor_info
    
    # 如果检查失败,尝试恢复
    if [ "$checks_passed" -lt "$total_checks" ]; then
        echo "尝试恢复服务..."
        
        # 重启Keepalived
        systemctl restart keepalived
        sleep 5
        
        # 重启Nginx
        systemctl restart nginx
        sleep 3
        
        echo "恢复操作完成"
    fi
    
    echo ""
    echo "=== 监控完成 ==="
}

# 执行主函数
main
设置定时监控
bash 复制代码
# 添加定时任务
echo "*/5 * * * * /usr/local/keepalived/scripts/monitor_keepalived.sh" | crontab -

# 查看定时任务
crontab -l

# 重启crond服务
systemctl restart crond

5.2 日志管理

Keepalived日志配置
bash 复制代码
# 创建日志目录
mkdir -p /var/log/keepalived
chown root:root /var/log/keepalived
chmod 755 /var/log/keepalived

# 配置rsyslog
cat >> /etc/rsyslog.d/keepalived.conf << EOF
# Keepalived日志配置
local0.* /var/log/keepalived/keepalived.log
& stop
EOF

# 重启rsyslog服务
systemctl restart rsyslog

# 配置logrotate
cat > /etc/logrotate.d/keepalived << EOF
/var/log/keepalived/*.log {
    daily
    missingok
    rotate 30
    compress
    delaycompress
    notifempty
    create 644 root root
    postrotate
        systemctl reload keepalived
    endscript
}
EOF

# 测试logrotate
logrotate -f /etc/logrotate.d/keepalived
日志分析脚本
bash 复制代码
#!/bin/bash
# /usr/local/keepalived/scripts/analyze_logs.sh

# Keepalived日志分析脚本
# 用法:./analyze_logs.sh

# 配置参数
LOG_DIR="/var/log/keepalived"
REPORT_FILE="/tmp/keepalived_report.txt"
ALERT_EMAIL="admin@example.com"

# 分析状态转换
analyze_state_transitions() {
    echo "=== 状态转换分析 ===" >> $REPORT_FILE
    echo "分析时间: $(date)" >> $REPORT_FILE
    
    # 查找状态转换记录
    grep -n "Transition to MASTER\|Transition to BACKUP\|Transition to FAULT" $LOG_DIR/keepalived.log | tail -10 >> $REPORT_FILE
    
    echo "" >> $REPORT_FILE
}

# 分析故障事件
analyze_fault_events() {
    echo "=== 故障事件分析 ===" >> $REPORT_FILE
    
    # 查找故障相关记录
    grep -n "Fault\|ERROR\|Failed" $LOG_DIR/keepalived.log | tail -10 >> $REPORT_FILE
    
    echo "" >> $REPORT_FILE
}

# 分析VRRP通信
analyze_vrrp_communication() {
    echo "=== VRRP通信分析 ===" >> $REPORT_FILE
    
    # 查找VRRP通信记录
    grep -n "VRRP_Script\|VRRP_Instance\|Received\|Sending" $LOG_DIR/keepalived.log | tail -10 >> $REPORT_FILE
    
    echo "" >> $REPORT_FILE
}

# 分析健康检查
analyze_health_checks() {
    echo "=== 健康检查分析 ===" >> $REPORT_FILE
    
    # 查找健康检查记录
    grep -n "check_nginx\|Health check\|Script" $LOG_DIR/keepalived.log | tail -10 >> $REPORT_FILE
    
    echo "" >> $REPORT_FILE
}

# 生成统计报告
generate_statistics() {
    echo "=== 统计报告 ===" >> $REPORT_FILE
    
    # 统计状态转换次数
    master_transitions=$(grep -c "Transition to MASTER" $LOG_DIR/keepalived.log)
    backup_transitions=$(grep -c "Transition to BACKUP" $LOG_DIR/keepalived.log)
    fault_transitions=$(grep -c "Transition to FAULT" $LOG_DIR/keepalived.log)
    
    echo "状态转换统计:" >> $REPORT_FILE
    echo "  MASTER转换次数: $master_transitions" >> $REPORT_FILE
    echo "  BACKUP转换次数: $backup_transitions" >> $REPORT_FILE
    echo "  FAULT转换次数: $fault_transitions" >> $REPORT_FILE
    
    # 统计错误次数
    error_count=$(grep -c "ERROR\|Failed" $LOG_DIR/keepalived.log)
    echo "  错误次数: $error_count" >> $REPORT_FILE
    
    echo "" >> $REPORT_FILE
}

# 检查异常情况
check_abnormalities() {
    echo "=== 异常情况检查 ===" >> $REPORT_FILE
    
    # 检查频繁的状态转换
    recent_transitions=$(grep -c "Transition to" $LOG_DIR/keepalived.log | tail -100)
    if [ "$recent_transitions" -gt 10 ]; then
        echo "警告: 最近100条日志中状态转换次数过多 ($recent_transitions)" >> $REPORT_FILE
    fi
    
    # 检查频繁的错误
    recent_errors=$(grep -c "ERROR\|Failed" $LOG_DIR/keepalived.log | tail -100)
    if [ "$recent_errors" -gt 5 ]; then
        echo "警告: 最近100条日志中错误次数过多 ($recent_errors)" >> $REPORT_FILE
    fi
    
    echo "" >> $REPORT_FILE
}

# 发送报告
send_report() {
    local subject="Keepalived日志分析报告 - $(date '+%Y-%m-%d %H:%M:%S')"
    
    # 发送邮件报告
    cat $REPORT_FILE | mail -s "$subject" $ALERT_EMAIL
    
    echo "报告已发送到 $ALERT_EMAIL"
}

# 主函数
main() {
    echo "开始分析Keepalived日志..."
    
    # 清空报告文件
    > $REPORT_FILE
    
    # 执行分析
    analyze_state_transitions
    analyze_fault_events
    analyze_vrrp_communication
    analyze_health_checks
    generate_statistics
    check_abnormalities
    
    # 显示报告摘要
    echo "分析完成,报告摘要:"
    echo "================================"
    cat $REPORT_FILE
    echo "================================"
    
    # 发送报告
    send_report
    
    echo "日志分析完成"
}

# 执行主函数
main

5.3 维护操作

日常维护检查清单
bash 复制代码
#!/bin/bash
# /usr/local/keepalived/scripts/daily_check.sh

# Keepalived日常维护检查脚本
# 用法:./daily_check.sh

# 配置参数
VIP="192.168.1.100"
MASTER_IP="192.168.1.10"
BACKUP_IP="192.168.1.11"
LOG_FILE="/var/log/keepalived/daily_check.log"

# 记录检查时间
echo "=== Keepalived日常检查 ===" > $LOG_FILE
echo "检查时间: $(date)" >> $LOG_FILE
echo "" >> $LOG_FILE

# 检查主服务器状态
check_master_server() {
    echo "1. 检查主服务器状态" >> $LOG_FILE
    
    # 检查Keepalived服务
    local keepalived_status=$(ssh root@$MASTER_IP "systemctl is-active keepalived")
    echo "   Keepalived服务状态: $keepalived_status" >> $LOG_FILE
    
    # 检查Nginx服务
    local nginx_status=$(ssh root@$MASTER_IP "systemctl is-active nginx")
    echo "   Nginx服务状态: $nginx_status" >> $LOG_FILE
    
    # 检查虚拟IP
    local vip_bound=$(ssh root@$MASTER_IP "ip addr show | grep -c '$VIP'")
    if [ "$vip_bound" -gt 0 ]; then
        echo "   虚拟IP状态: 已绑定" >> $LOG_FILE
    else
        echo "   虚拟IP状态: 未绑定" >> $LOG_FILE
    fi
    
    # 检查系统资源
    local cpu_usage=$(ssh root@$MASTER_IP "top -bn1 | grep 'Cpu(s)' | awk '{print \$2}' | cut -d'%' -f1")
    local memory_usage=$(ssh root@$MASTER_IP "free -m | grep 'Mem:' | awk '{printf \"%.2f\", \$3/\$2*100}'")
    local disk_usage=$(ssh root@$MASTER_IP "df -h / | awk 'NR==2 {print \$5}' | cut -d'%' -f1")
    
    echo "   CPU使用率: ${cpu_usage}%" >> $LOG_FILE
    echo "   内存使用率: ${memory_usage}%" >> $LOG_FILE
    echo "   磁盘使用率: ${disk_usage}%" >> $LOG_FILE
    
    echo "" >> $LOG_FILE
}

# 检查备用服务器状态
check_backup_server() {
    echo "2. 检查备用服务器状态" >> $LOG_FILE
    
    # 检查Keepalived服务
    local keepalived_status=$(ssh root@$BACKUP_IP "systemctl is-active keepalived")
    echo "   Keepalived服务状态: $keepalived_status" >> $LOG_FILE
    
    # 检查Nginx服务
    local nginx_status=$(ssh root@$BACKUP_IP "systemctl is-active nginx")
    echo "   Nginx服务状态: $nginx_status" >> $LOG_FILE
    
    # 检查虚拟IP
    local vip_bound=$(ssh root@$BACKUP_IP "ip addr show | grep -c '$VIP'")
    if [ "$vip_bound" -gt 0 ]; then
        echo "   虚拟IP状态: 已绑定" >> $LOG_FILE
    else
        echo "   虚拟IP状态: 未绑定" >> $LOG_FILE
    fi
    
    # 检查系统资源
    local cpu_usage=$(ssh root@$BACKUP_IP "top -bn1 | grep 'Cpu(s)' | awk '{print \$2}' | cut -d'%' -f1")
    local memory_usage=$(ssh root@$BACKUP_IP "free -m | grep 'Mem:' | awk '{printf \"%.2f\", \$3/\$2*100}'")
    local disk_usage=$(ssh root@$BACKUP_IP "df -h / | awk 'NR==2 {print \$5}' | cut -d'%' -f1")
    
    echo "   CPU使用率: ${cpu_usage}%" >> $LOG_FILE
    echo "   内存使用率: ${memory_usage}%" >> $LOG_FILE
    echo "   磁盘使用率: ${disk_usage}%" >> $LOG_FILE
    
    echo "" >> $LOG_FILE
}

# 检查网络连通性
check_network_connectivity() {
    echo "3. 检查网络连通性" >> $LOG_FILE
    
    # 测试主服务器连通性
    if ping -c 1 -W 1 $MASTER_IP > /dev/null 2>&1; then
        echo "   主服务器连通性: 正常" >> $LOG_FILE
    else
        echo "   主服务器连通性: 失败" >> $LOG_FILE
    fi
    
    # 测试备用服务器连通性
    if ping -c 1 -W 1 $BACKUP_IP > /dev/null 2>&1; then
        echo "   备用服务器连通性: 正常" >> $LOG_FILE
    else
        echo "   备用服务器连通性: 失败" >> $LOG_FILE
    fi
    
    # 测试虚拟IP连通性
    if ping -c 1 -W 1 $VIP > /dev/null 2>&1; then
        echo "   虚拟IP连通性: 正常" >> $LOG_FILE
    else
        echo "   虚拟IP连通性: 失败" >> $LOG_FILE
    fi
    
    echo "" >> $LOG_FILE
}

# 检查服务可用性
check_service_availability() {
    echo "4. 检查服务可用性" >> $LOG_FILE
    
    # 测试HTTP服务
    local http_code=$(curl -o /dev/null -s -w "%{http_code}" http://$VIP)
    echo "   HTTP服务状态: $http_code" >> $LOG_FILE
    
    # 测试Nginx状态页面
    local status_code=$(curl -o /dev/null -s -w "%{http_code}" http://$VIP/status)
    echo "   状态页面状态: $status_code" >> $LOG_FILE
    
    echo "" >> $LOG_FILE
}

# 检查日志文件
check_log_files() {
    echo "5. 检查日志文件" >> $LOG_FILE
    
    # 检查Keepalived日志大小
    local log_size=$(du -m /var/log/keepalived/keepalived.log | cut -f1)
    echo "   Keepalived日志大小: ${log_size}MB" >> $LOG_FILE
    
    # 检查最近的错误
    local recent_errors=$(grep -c "ERROR\|Failed" /var/log/keepalived/keepalived.log | tail -100)
    echo "   最近错误数量: $recent_errors" >> $LOG_FILE
    
    # 检查最近的状态转换
    local recent_transitions=$(grep -c "Transition to" /var/log/keepalived/keepalived.log | tail -100)
    echo "   最近状态转换数量: $recent_transitions" >> $LOG_FILE
    
    echo "" >> $LOG_FILE
}

# 生成检查报告
generate_report() {
    echo "6. 检查报告" >> $LOG_FILE
    
    # 检查是否需要告警
    local alert_needed=0
    
    # 检查服务状态
    local master_keepalived=$(ssh root@$MASTER_IP "systemctl is-active keepalived")
    local master_nginx=$(ssh root@$MASTER_IP "systemctl is-active nginx")
    local backup_keepalived=$(ssh root@$BACKUP_IP "systemctl is-active keepalived")
    local backup_nginx=$(ssh root@$BACKUP_IP "systemctl is-active nginx")
    
    if [ "$master_keepalived" != "active" ] || [ "$master_nginx" != "active" ]; then
        echo "   警告: 主服务器服务异常" >> $LOG_FILE
        alert_needed=1
    fi
    
    if [ "$backup_keepalived" != "active" ] || [ "$backup_nginx" != "active" ]; then
        echo "   警告: 备用服务器服务异常" >> $LOG_FILE
        alert_needed=1
    fi
    
    # 检查资源使用率
    local master_cpu=$(ssh root@$MASTER_IP "top -bn1 | grep 'Cpu(s)' | awk '{print \$2}' | cut -d'%' -f1")
    local master_memory=$(ssh root@$MASTER_IP "free -m | grep 'Mem:' | awk '{printf \"%.2f\", \$3/\$2*100}'")
    local backup_cpu=$(ssh root@$BACKUP_IP "top -bn1 | grep 'Cpu(s)' | awk '{print \$2}' | cut -d'%' -f1")
    local backup_memory=$(ssh root@$BACKUP_IP "free -m | grep 'Mem:' | awk '{printf \"%.2f\", \$3/\$2*100}'")
    
    if (( $(echo "$master_cpu > 80" | bc -l) )); then
        echo "   警告: 主服务器CPU使用率过高 (${master_cpu}%)" >> $LOG_FILE
        alert_needed=1
    fi
    
    if (( $(echo "$master_memory > 80" | bc -l) )); then
        echo "   警告: 主服务器内存使用率过高 (${master_memory}%)" >> $LOG_FILE
        alert_needed=1
    fi
    
    if (( $(echo "$backup_cpu > 80" | bc -l) )); then
        echo "   警告: 备用服务器CPU使用率过高 (${backup_cpu}%)" >> $LOG_FILE
        alert_needed=1
    fi
    
    if (( $(echo "$backup_memory > 80" | bc -l) )); then
        echo "   警告: 备用服务器内存使用率过高 (${backup_memory}%)" >> $LOG_FILE
        alert_needed=1
    fi
    
    if [ "$alert_needed" -eq 1 ]; then
        echo "   建议: 需要关注系统状态" >> $LOG_FILE
    else
        echo "   状态: 所有检查项目正常" >> $LOG_FILE
    fi
    
    echo "" >> $LOG_FILE
}

# 主函数
main() {
    echo "开始Keepalived日常检查..."
    
    # 执行各项检查
    check_master_server
    check_backup_server
    check_network_connectivity
    check_service_availability
    check_log_files
    generate_report
    
    # 显示检查结果
    echo "日常检查完成,详细报告请查看: $LOG_FILE"
    echo "================================"
    tail -20 $LOG_FILE
    echo "================================"
}

# 执行主函数
main
设置定时检查
bash 复制代码
# 添加每日检查任务
echo "0 9 * * * /usr/local/keepalived/scripts/daily_check.sh" | crontab -

# 添加每周日志分析任务
echo "0 0 * * 0 /usr/local/keepalived/scripts/analyze_logs.sh" | crontab -

# 查看定时任务
crontab -l

六、实战案例

6.1 电商网站高可用部署

架构设计
复制代码
                              +-----------------+
                              |   负载均衡器    |
                              | (Keepalived+VIP)|
                              +--------+--------+
                                       |
                    +-------------------+-------------------+
                    |                   |                   |
          +---------v---------+ +---------v---------+ +---------v---------+
          |   Web服务器1      | |   Web服务器2      | |   Web服务器3      |
          | (Nginx+Keepalived)| | (Nginx+Keepalived)| | (Nginx+Keepalived)|
          +-------------------+ +-------------------+ +-------------------+
                    |                   |                   |
                    +-------------------+-------------------+
                                       |
                              +--------+--------+
                              |   应用服务器集群  |
                              +-----------------+
配置文件示例

主服务器配置:

nginx 复制代码
# /etc/keepalived/keepalived.conf

global_defs {
    router_id nginx-master
    notification_email {
        admin@example.com
        ops@example.com
    }
    notification_email_from keepalived@example.com
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    script_user root
    enable_script_security
}

vrrp_script check_nginx {
    script "/usr/local/keepalived/scripts/check_nginx.sh"
    interval 2
    timeout 2
    fall 3
    rise 2
    weight -5
    user root
    group root
}

vrrp_script check_app {
    script "/usr/local/keepalived/scripts/check_app.sh"
    interval 5
    timeout 2
    fall 3
    rise 2
    weight -3
    user root
    group root
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass ecommerce2024
    }
    virtual_ipaddress {
        192.168.1.100/24 dev eth0 label eth0:0
    }
    track_script {
        check_nginx
        check_app
    }
    notify_master "/usr/local/keepalived/scripts/notify.sh MASTER 100"
    notify_backup "/usr/local/keepalived/scripts/notify.sh BACKUP 100"
    notify_fault "/usr/local/keepalived/scripts/notify.sh FAULT 100"
    preempt_delay 300
}

vrrp_instance VI_2 {
    state BACKUP
    interface eth0
    virtual_router_id 52
    priority 90
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass ecommerce2024
    }
    virtual_ipaddress {
        192.168.1.101/24 dev eth0 label eth0:1
    }
    track_script {
        check_nginx
        check_app
    }
    notify_master "/usr/local/keepalived/scripts/notify.sh MASTER 90"
    notify_backup "/usr/local/keepalived/scripts/notify.sh BACKUP 90"
    notify_fault "/usr/local/keepalived/scripts/notify.sh FAULT 90"
}

应用服务检查脚本:

bash 复制代码
#!/bin/bash
# /usr/local/keepalived/scripts/check_app.sh

# 检查应用服务端口
if ! netstat -tlnp | grep -q ":8080"; then
    echo "Application port 8080 not listening"
    exit 1
fi

# 检查应用服务健康检查接口
if ! curl -f http://localhost:8080/health > /dev/null 2>&1; then
    echo "Application health check failed"
    exit 1
fi

# 检查数据库连接
if ! curl -f http://localhost:8080/db-check > /dev/null 2>&1; then
    echo "Database connection check failed"
    exit 1
fi

# 检查缓存服务
if ! curl -f http://localhost:8080/cache-check > /dev/null 2>&1; then
    echo "Cache service check failed"
    exit 1
fi

echo "Application health check passed"
exit 0

6.2 微服务架构高可用部署

架构设计
复制代码
                              +-----------------+
                              |   API网关       |
                              | (Keepalived+VIP)|
                              +--------+--------+
                                       |
                    +-------------------+-------------------+
                    |                   |                   |
          +---------v---------+ +---------v---------+ +---------v---------+
          |   API网关1        | |   API网关2        | |   API网关3        |
          | (Nginx+Keepalived)| | (Nginx+Keepalived)| | (Nginx+Keepalived)|
          +-------------------+ +-------------------+ +-------------------+
                    |                   |                   |
                    +-------------------+-------------------+
                                       |
                              +--------+--------+
                              |   服务注册中心   |
                              +-----------------+
                                       |
                    +-------------------+-------------------+
                    |                   |                   |
          +---------v---------+ +---------v---------+ +---------v---------+
          |   微服务集群1      | |   微服务集群2      | |   微服务集群3      |
          +-------------------+ +-------------------+ +-------------------+
配置文件示例

API网关配置:

nginx 复制代码
# /etc/keepalived/keepalived.conf

global_defs {
    router_id api-gateway-master
    notification_email {
        admin@example.com
        ops@example.com
    }
    notification_email_from keepalived@example.com
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    script_user root
    enable_script_security
}

vrrp_script check_nginx {
    script "/usr/local/keepalived/scripts/check_nginx.sh"
    interval 2
    timeout 2
    fall 3
    rise 2
    weight -5
    user root
    group root
}

vrrp_script check_services {
    script "/usr/local/keepalived/scripts/check_services.sh"
    interval 5
    timeout 2
    fall 3
    rise 2
    weight -3
    user root
    group root
}

vrrp_script check_discovery {
    script "/usr/local/keepalived/scripts/check_discovery.sh"
    interval 10
    timeout 5
    fall 2
    rise 2
    weight -2
    user root
    group root
}

vrrp_instance VI_API {
    state MASTER
    interface eth0
    virtual_router_id 100
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass microservice2024
    }
    virtual_ipaddress {
        192.168.1.200/24 dev eth0 label eth0:0
    }
    track_script {
        check_nginx
        check_services
        check_discovery
    }
    notify_master "/usr/local/keepalived/scripts/notify.sh MASTER 100"
    notify_backup "/usr/local/keepalived/scripts/notify.sh BACKUP 100"
    notify_fault "/usr/local/keepalived/scripts/notify.sh FAULT 100"
    preempt_delay 300
}

vrrp_instance VI_INTERNAL {
    state BACKUP
    interface eth1
    virtual_router_id 101
    priority 90
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass microservice2024
    }
    virtual_ipaddress {
        10.0.0.100/24 dev eth1 label eth1:0
    }
    track_script {
        check_nginx
        check_services
    }
    notify_master "/usr/local/keepalived/scripts/notify.sh MASTER 90"
    notify_backup "/usr/local/keepalived/scripts/notify.sh BACKUP 90"
    notify_fault "/usr/local/keepalived/scripts/notify.sh FAULT 90"
}

服务检查脚本:

bash 复制代码
#!/bin/bash
# /usr/local/keepalived/scripts/check_services.sh

# 检查关键微服务
services=("user-service" "product-service" "order-service" "payment-service")

for service in "${services[@]}"; do
    # 检查服务端口
    if ! netstat -tlnp | grep -q ":808${service#*-service}"; then
        echo "Service $service port not listening"
        exit 1
    fi
    
    # 检查服务健康状态
    if ! curl -f http://localhost:808${service#*-service}/health > /dev/null 2>&1; then
        echo "Service $service health check failed"
        exit 1
    fi
done

# 检查服务注册中心
if ! curl -f http://localhost:8761/eureka/apps > /dev/null 2>&1; then
    echo "Service registry check failed"
    exit 1
fi

# 检查配置中心
if ! curl -f http://localhost:8888/actuator/health > /dev/null 2>&1; then
    echo "Config server check failed"
    exit 1
fi

echo "All services health check passed"
exit 0

性能优化建议:

  • 网络优化:优化网络配置,减少网络延迟
  • 资源管理:合理分配系统资源,避免资源瓶颈
  • 参数调优:根据实际情况调整Keepalived参数
  • 负载均衡:结合负载均衡技术,提高系统处理能力

安全配置建议:

  • 访问控制:限制Keepalived管理接口的访问权限
  • 认证配置:配置VRRP认证,防止非法接入
  • 日志安全:保护日志文件,防止敏感信息泄露
  • 网络安全:配置防火墙规则,保护系统安全

Keepalived+双机热备方案是企业级高可用架构的基础,通过合理的配置和管理,可以构建稳定可靠的系统环境。在实际应用中,还需要结合具体的业务需求和技术架构,选择合适的部署方案和配置策略。

高可用架构是一个持续优化的过程,需要不断地监控、测试和改进。希望本文能够为你提供有价值的参考,帮助大家构建更加稳定可靠的系统架构。