服务自动添加实例工具

介绍

本项目通过使用 Nginx 和 Java 应用实现了服务实例的动态扩展功能。当健康检查接口的响应时间超过设定阈值时，系统会自动新增服务实例以分担负载；当负载压力降低时，系统会自动减少不必要的实例数，从而确保服务的稳定性和高可用性。

功能特点

动态扩缩容：根据健康检查接口的响应时间动态调整服务实例数量。
负载均衡：通过 Nginx 配置实现负载均衡，支持动态更新后端实例。
冷却机制：避免频繁的扩缩容操作，确保系统的稳定性。
日志记录 ：所有扩缩容操作都会被记录到 autoscale.log 文件中，便于排查问题。

快速开始

环境要求

操作系统：Linux（推荐 CentOS 或 Ubuntu）
软件依赖：
- JDK 1.8+
- Nginx

安装步骤

安装 Nginx 和 JDK

运行以下命令安装必要的软件并启动相关服务：
bash 复制代码
```
./install.sh
```
准备 Java 应用程序

将编译好的 JAR 文件放置在 /root/app.jar 路径下（路径可以在 auto_scale.sh 中修改）。
启动自动扩缩容脚本

启动 autoscale.sh 脚本进行自动扩缩容：
bash 复制代码
```
./auto_scale.sh
```
验证运行状态
- 查看 Nginx 是否正常运行：systemctl status nginx
- 查看日志文件：tail -f autoscale.log

自动扩缩容逻辑

配置参数

参数名	描述	默认值
`PORT_RANGE`	可用端口范围	8081-9000
`MIN_INSTANCES`	最小实例数	2
`MAX_INSTANCES`	最大实例数	10
`HEALTH_CHECK_URL`	健康检查接口地址	`http://localhost/health`
`RESPONSE_TIME_THRESHOLD`	响应时间阈值（毫秒）	500
`COOL_DOWN_TIME`	冷却时间（秒）	60

扩容触发条件

当健康检查接口的响应时间超过 RESPONSE_TIME_THRESHOLD（默认为 500ms）时，系统会尝试扩容。
如果当前实例数已达到 MAX_INSTANCES，则不会继续扩容。

缩容触发条件

当前实例数超过 MIN_INSTANCES 且无负载压力时，系统会尝试缩容。
如果当前实例数已达到 MIN_INSTANCES，则不会继续缩容。

冷却机制

在每次扩缩容操作后，系统会在 COOL_DOWN_TIME（默认为 60秒）内暂停任何新的扩缩容操作，以避免频繁调整。

日志管理

所有的扩缩容操作日志会被记录到 autoscale.log 文件中。以下是日志示例：

plaintext 复制代码

2023-10-01 10:00:00 当前实例数：2
2023-10-01 10:00:00 当前响应时间：600 ms
2023-10-01 10:00:00 扩容：启动新实例，监听端口 8083
2023-10-01 10:00:00 Nginx 配置已更新：新增端口 8083

可以通过以下命令实时查看日志：

bash 复制代码

tail -f autoscale.log

配置文件说明

`autoscale.sh`

该脚本实现了自动扩缩容的核心逻辑，包括健康检查、扩容、缩容以及 Nginx 配置的动态更新。

`install.sh`

用于自动化安装 Nginx 和 JDK，并配置初始环境。

`nginx.conf`

Nginx 的配置文件，定义了负载均衡规则和健康检查接口。以下是关键部分：

upstream backend：动态添加服务实例的 IP 和端口。
location /：将请求转发到后端服务。
location = /health：定义健康检查接口的路径。

常见问题

Q: 扩缩容脚本无法正常运行怎么办？

检查是否正确安装了 Nginx 和 JDK。
确保 JAR 文件已放置在指定路径。
查看 autoscale.log 文件中的错误信息。

Q: 如何调整扩缩容参数？

A: 编辑 autoscale.sh 文件，修改以下参数：

PORT_RANGE：调整可用端口范围。
MIN_INSTANCES 和 MAX_INSTANCES：设置最小和最大实例数。
RESPONSE_TIME_THRESHOLD：调整健康检查的响应时间阈值。
COOL_DOWN_TIME：调整冷却时间。

Q: 如何测试健康检查接口？

A: 使用以下命令测试健康检查接口的响应时间：

bash 复制代码

curl -o /dev/null -s -w "%{time_total}\n" http://localhost/health

未来改进方向

支持多节点部署：目前仅支持单机环境下的扩缩容，后续可以扩展为支持多节点的集群环境。
集成监控系统：将扩缩容日志集成到 Prometheus 或 Grafana 中，提供更直观的监控界面。
动态调整阈值：根据历史数据动态调整扩缩容的触发条件，提升智能化水平。

源码下载

服务自动添加实例工具

核心脚本

script/auto_scale.sh

bash 复制代码

#!/bin/bash

# 配置参数
PORT_RANGE="8081-9000"         # 端口范围
START_PORT=$(echo $PORT_RANGE | cut -d'-' -f1)
END_PORT=$(echo $PORT_RANGE | cut -d'-' -f2)

MIN_INSTANCES=2                # 最小实例数
MAX_INSTANCES=10               # 最大实例数
HEALTH_CHECK_URL="http://localhost/health" # 拨测接口
RESPONSE_TIME_THRESHOLD=500    # 响应时间阈值（毫秒）
COOL_DOWN_TIME=60              # 冷却时间（60秒）
JAR_PATH="/root/app.jar"       # JAR 包路径

# 全局变量
CURRENT_INSTANCE_COUNT=0
LAST_ACTION_TIME=0

# 端口映射文件路径
PORT_MAPPING_FILE="$(pwd)/app_port_mapping.txt"

# 初始化端口映射文件
if [ ! -f "$PORT_MAPPING_FILE" ]; then
    touch "$PORT_MAPPING_FILE"
fi

# 获取当前实例数
get_instance_count() {
    CURRENT_INSTANCE_COUNT=$(pgrep -f "java -jar $JAR_PATH" | wc -l)
}

# 检查是否在冷却时间内
is_in_cool_down() {
    local current_time=$(date +%s)
    if ((current_time - LAST_ACTION_TIME < COOL_DOWN_TIME)); then
        return 0 # 在冷却时间内
    else
        return 1 # 不在冷却时间内
    fi
}

# 扩容
scale_up() {
    if ((CURRENT_INSTANCE_COUNT >= MAX_INSTANCES)); then
        echo "$(date) 达到最大实例数，无法扩容" >> autoscale.log
        return
    fi

    # 查找下一个可用端口
    local port
    for port in $(seq $START_PORT $END_PORT); do
        if ! grep -q ":$port" "$PORT_MAPPING_FILE"; then
            break
        fi
    done

    if [ -z "$port" ]; then
        echo "$(date) 无可用地址范围内的端口" >> autoscale.log
        return
    fi

    # 启动新实例
    nohup java -jar "$JAR_PATH" --server.port=$port > app_$port.log 2>&1 &
    local pid=$!

    # 记录 PID 和端口到映射文件
    echo "$pid:$port" >> "$PORT_MAPPING_FILE"
    echo "$(date) 扩容：启动新实例，监听端口 $port，PID $pid" >> autoscale.log

    # 更新 Nginx 配置
    update_nginx_config

    # 记录最后操作时间
    LAST_ACTION_TIME=$(date +%s)
}

# 获取端口函数
get_port_by_pid() {
    local pid=$1
    local port=$(lsof -Pn -p $pid | grep LISTEN | awk '{print $9}' | grep -oE ':[0-9]+' | cut -d':' -f2)
    echo "$port"
}

# 缩容
scale_down() {
    if ((CURRENT_INSTANCE_COUNT <= MIN_INSTANCES)); then
        echo "$(date) 达到最小实例数，无法缩容" >> autoscale.log
        return
    fi

    # 获取最后一个实例的 PID 和端口
    last_line=$(tail -n 1 "$PORT_MAPPING_FILE")
    if [ -z "$last_line" ]; then
        echo "$(date) 映射文件为空，无法缩容" >> autoscale.log
        return
    fi

    last_pid=$(echo "$last_line" | cut -d':' -f1)
    last_port=$(echo "$last_line" | cut -d':' -f2)

    # 停止实例
    kill "$last_pid" 2>/dev/null
    sed -i '$d' "$PORT_MAPPING_FILE" # 删除最后一行记录
    echo "$(date) 缩容：停止实例，端口 $last_port，PID $last_pid" >> autoscale.log

    # 更新 Nginx 配置
    update_nginx_config

    # 记录最后操作时间
    LAST_ACTION_TIME=$(date +%s)
}

# 更新 Nginx 配置
update_nginx_config() {
    # 获取所有正在运行的实例的端口
    local ports=()
    while IFS=: read -r pid port; do
        if kill -0 "$pid" 2>/dev/null; then
            ports+=("127.0.0.1:$port")
        else
            # 如果进程已不存在，清理映射文件
            sed -i "/^$pid:$port$/d" "$PORT_MAPPING_FILE"
        fi
    done < "$PORT_MAPPING_FILE"

    # 如果没有找到任何端口，退出
    if [ ${#ports[@]} -eq 0 ]; then
        echo "$(date) 没有可用的端口，跳过 Nginx 配置更新" >> autoscale.log
        return
    fi

    # 动态生成 upstream 配置
    local upstream_config="upstream backend {\n"
    for p in "${ports[@]}"; do
        upstream_config+="    server $p;\n"
    done
    upstream_config+="}\n"

    # 替换 nginx.conf 中的 upstream 部分
    sed -i "/upstream backend {/,/}/d" /etc/nginx/nginx.conf
    sed -i "/http {/a\\$upstream_config" /etc/nginx/nginx.conf

    # 重新启动 Nginx
    systemctl restart nginx

    echo "$(date) Nginx 配置已更新：$(echo ${ports[@]})" >> autoscale.log
}

# 检查健康状态
check_health() {
    # 发起请求并获取响应时间和状态码
    local response=$(curl -o /dev/null -s -w "%{http_code} %{time_total}" $HEALTH_CHECK_URL)
    local http_code=$(echo "$response" | awk '{print $1}')
    local response_time=$(echo "$response" | awk '{print $2}')

    # 将响应时间转换为毫秒
    response_time=$(echo "$response_time * 1000" | bc) # 转换为毫秒

    echo "$(date) 当前响应时间：$response_time ms, HTTP 状态码：$http_code" >> autoscale.log

    # 检查 HTTP 状态码是否为 2xx
    if [ "$http_code" -lt 200 ] || [ "$http_code" -ge 300 ]; then
        echo "$(date) 服务不健康，HTTP 状态码为 $http_code" >> autoscale.log
        if is_in_cool_down; then
            echo "$(date) 冷却中，跳过扩容" >> autoscale.log
        else
            scale_up
        fi
        return
    fi

    # 使用 bc 进行浮点数比较，检查响应时间是否超过阈值
    if [ $(echo "$response_time > $RESPONSE_TIME_THRESHOLD" | bc) -eq 1 ]; then
        echo "$(date) 响应时间超过阈值 ($RESPONSE_TIME_THRESHOLD ms)" >> autoscale.log
        if is_in_cool_down; then
            echo "$(date) 冷却中，跳过扩容" >> autoscale.log
        else
            scale_up
        fi
    fi
}

# 清理函数
cleanup() {
    echo "$(date) 开始清理..." >> autoscale.log

    # 杀掉所有由该脚本启动的 Java 进程
    pids=$(pgrep -f "java -jar $JAR_PATH")
    if [ -n "$pids" ]; then
        echo "$(date) 正在杀掉 Java 进程: $pids" >> autoscale.log
        kill $pids
    fi

    # 删除端口映射文件
    rm -f "$PORT_MAPPING_FILE"
    echo "$(date) 删除端口映射文件" >> autoscale.log

    # 还原 Nginx 配置
    NGINX_CONF_PATH="/etc/nginx/nginx.conf"
    CUSTOM_NGINX_CONF_PATH="$(pwd)/nginx.conf"

    if [ -f "$CUSTOM_NGINX_CONF_PATH" ]; then
        cp "$CUSTOM_NGINX_CONF_PATH" "$NGINX_CONF_PATH"
        echo "$(date) Nginx 配置已还原" >> autoscale.log
    else
        echo "未找到自定义的 Nginx 配置文件，请确保 nginx.conf 存在于当前目录" >> autoscale.log
        exit 1
    fi

    # 关闭 Nginx
    systemctl stop nginx

    echo "$(date) 清理完成" >> autoscale.log
    exit 0
}

# 捕获 SIGTERM 和 SIGINT 信号
trap cleanup SIGTERM SIGINT

# 主循环
while true; do
    get_instance_count
    echo "$(date) 当前实例数：$CURRENT_INSTANCE_COUNT" >> autoscale.log

    # 如果实例数小于最小值，启动新实例
    if ((CURRENT_INSTANCE_COUNT < MIN_INSTANCES)); then
        echo "$(date) 实例数低于最小值，启动新实例" >> autoscale.log
        while ((CURRENT_INSTANCE_COUNT < MIN_INSTANCES)); do
            scale_up
            get_instance_count
        done

        # 扩容完成后等待一段时间（例如60秒）
        echo "$(date) 扩容完成，等待 $COOL_DOWN_TIME 秒后再触发健康检查" >> autoscale.log
        sleep $COOL_DOWN_TIME
    fi

    check_health

    # 检查是否需要缩容
    if ((CURRENT_INSTANCE_COUNT > MIN_INSTANCES)); then
        if is_in_cool_down; then
            echo "$(date) 冷却中，跳过缩容" >> autoscale.log
        else
            scale_down
        fi
    fi

    # 动态更新 Nginx 配置
    update_nginx_config

    sleep 10 # 每10秒检查一次
done

script/install.sh

bash 复制代码

#!/bin/bash

# 安装 Nginx
echo "正在安装 Nginx..."
if yum install -y nginx; then
    echo "Nginx 安装成功"
else
    echo "Nginx 安装失败，请检查系统环境"
    exit 1
fi

# 替换默认的 Nginx 配置文件为自定义的配置文件
echo "正在替换 Nginx 配置文件..."
NGINX_CONF_PATH="/etc/nginx/nginx.conf"
CUSTOM_NGINX_CONF_PATH="$(pwd)/nginx.conf" # 假设当前目录下有 nginx.conf 文件

if [ -f "$CUSTOM_NGINX_CONF_PATH" ]; then
    cp "$CUSTOM_NGINX_CONF_PATH" "$NGINX_CONF_PATH"
    echo "Nginx 配置文件替换成功"
else
    echo "未找到自定义的 Nginx 配置文件，请确保 nginx.conf 存在于当前目录"
    exit 1
fi

echo "正在安装 jdk..."
if yum install -y java-1.8.0-openjdk-devel.x86_64; then
    echo "jdk 安装成功"
else
    echo "jdk 安装失败，请检查系统环境"
    exit 1
fi

# 提示用户完成安装
echo "安装完成！请确保 JAR 包位于指定路径并正确运行。"

script/nginx.conf

bash 复制代码

worker_processes  1;

events {
	worker_connections  1024;
}

http {
    upstream backend {
        # 动态添加服务实例的 IP 和端口
    }

    server {
        listen 80;

        location / {
            proxy_pass http://backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }

        # 健康检查接口
        location = /health {
            proxy_pass http://backend/health;
        }
    }
}