一套完整的自动化脚本,一键搭建5节点负载均衡集群。
5节点K3s负载均衡集群 - 一键部署方案
架构速览
┌─────────────────┐
│ 负载均衡入口 │
│ (Nginx/Keepalived)
│ VIP: 192.168.1.9 │
└────────┬────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│Master-1 │◄────────►│Master-2 │◄────────►│Master-3 │
│192.168.1.10 │192.168.1.11 │192.168.1.12
│(K3s Server+DB) │(K3s Server+DB) │(K3s Server+DB)
│etcd集群 │etcd集群 │etcd集群
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└────────────────────┼────────────────────┘
│
┌────────┴────────┐
│ │
┌────┴────┐ ┌────┴────┐
│ Worker-1│ │ Worker-2│
│192.168.1.13 │192.168.1.14
│(应用负载) │(应用负载)
└─────────┘ └─────────┘
高可用设计:3个Master组成etcd集群,Worker通过VIP接入,任意Master宕机不影响集群。
第一步:准备5台服务器
| 节点 | IP | 配置 | 角色 |
|---|---|---|---|
| node1 | 192.168.1.10 | 2核4G+ | Master-1 |
| node2 | 192.168.1.11 | 2核4G+ | Master-2 |
| node3 | 192.168.1.12 | 2核4G+ | Master-3 |
| node4 | 192.168.1.13 | 2核4G+ | Worker-1 |
| node5 | 192.168.1.14 | 2核4G+ | Worker-2 |
虚拟IP(VIP) :192.168.1.9(用于Master高可用入口)
第二步:一键部署脚本
1. 通用初始化脚本(所有节点执行)
bash
#!/bin/bash
# ============================================
# 文件名: 01-init-all-nodes.sh
# 用途: 所有节点的通用初始化
# 执行: 在5台服务器上分别执行
# ============================================
set -e
# 配置变量(根据实际修改)
NODE_IP=$1 # 传入本机IP
NODE_NAME=$2 # 传入本机主机名
VIP="192.168.1.9" # 虚拟IP
if [ -z "$NODE_IP" ] || [ -z "$NODE_NAME" ]; then
echo "用法: ./01-init-all-nodes.sh <IP> <主机名>"
echo "示例: ./01-init-all-nodes.sh 192.168.1.10 k3s-master1"
exit 1
fi
echo "========== 开始初始化节点: $NODE_NAME ($NODE_IP) =========="
# 1. 设置主机名
hostnamectl set-hostname $NODE_NAME
echo "✓ 主机名设置为: $NODE_NAME"
# 2. 配置hosts(所有节点互通)
cat > /etc/hosts <<EOF
127.0.0.1 localhost
$VIP k3s-api
192.168.1.10 k3s-master1
192.168.1.11 k3s-master2
192.168.1.12 k3s-master3
192.168.1.13 k3s-worker1
192.168.1.14 k3s-worker2
EOF
echo "✓ Hosts配置完成"
# 3. 关闭Swap(K8s必需)
swapoff -a
sed -i '/swap/d' /etc/fstab
echo "✓ Swap已关闭"
# 4. 关闭防火墙(学习阶段,生产环境建议精细配置)
systemctl stop ufw 2>/dev/null || true
systemctl disable ufw 2>/dev/null || true
iptables -F
echo "✓ 防火墙已关闭"
# 5. 加载内核模块(K3s需要)
cat > /etc/modules-load.d/k3s.conf <<EOF
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
echo "✓ 内核模块加载完成"
# 6. 配置sysctl
cat > /etc/sysctl.d/k3s.conf <<EOF
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system > /dev/null
echo "✓ Sysctl配置完成"
# 7. 安装基础工具
apt-get update > /dev/null
apt-get install -y curl wget vim net-tools htop tree jq \
open-iscsi nfs-common 2>/dev/null
echo "✓ 基础工具安装完成"
# 8. 安装Docker(用于学习阶段构建镜像)
if ! command -v docker &> /dev/null; then
curl -fsSL https://get.docker.com | sh
systemctl enable docker
systemctl start docker
# 配置镜像加速
mkdir -p /etc/docker
cat > /etc/docker/daemon.json <<EOF
{
"registry-mirrors": [
"https://docker.mirrors.ustc.edu.cn",
"https://hub-mirror.c.163.com"
],
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
systemctl restart docker
echo "✓ Docker安装完成"
else
echo "✓ Docker已存在,跳过安装"
fi
# 9. 时间同步
timedatectl set-timezone Asia/Shanghai
apt-get install -y chrony
systemctl enable chrony
systemctl start chrony
echo "✓ 时间同步配置完成"
echo ""
echo "========== $NODE_NAME 初始化完成 =========="
echo "IP: $NODE_IP"
echo "请继续执行对应角色的部署脚本"
执行方式(5台分别执行):
bash
# 在node1执行
chmod +x 01-init-all-nodes.sh
./01-init-all-nodes.sh 192.168.1.10 k3s-master1
# 在node2执行
./01-init-all-nodes.sh 192.168.1.11 k3s-master2
# 在node3执行
./01-init-all-nodes.sh 192.168.1.12 k3s-master3
# 在node4执行
./01-init-all-nodes.sh 192.168.1.13 k3s-worker1
# 在node5执行
./01-init-all-nodes.sh 192.168.1.14 k3s-worker2
2. Master高可用负载均衡脚本(Master节点执行)
bash
#!/bin/bash
# ============================================
# 文件名: 02-setup-master-lb.sh
# 用途: 在3个Master上部署Nginx负载均衡 + Keepalived
# 执行: 在Master-1上先执行,获取token后再在Master-2/3执行
# ============================================
set -e
MASTER_IP=$1
MASTER_INDEX=$2 # 1, 2, 或 3
VIP="192.168.1.9"
if [ -z "$MASTER_IP" ] || [ -z "$MASTER_INDEX" ]; then
echo "用法: ./02-setup-master-lb.sh <本机IP> <Master序号1/2/3>"
exit 1
fi
echo "========== 部署Master-$MASTER_INDEX 负载均衡 ($MASTER_IP) =========="
# 安装Nginx和Keepalived
apt-get install -y nginx keepalived
# 配置Nginx作为K3s API的TCP负载均衡
cat > /etc/nginx/nginx.conf <<EOF
user www-data;
worker_processes auto;
pid /run/nginx.pid;
events {
worker_connections 1024;
}
stream {
upstream k3s_api {
least_conn;
server 192.168.1.10:6443 max_fails=3 fail_timeout=5s;
server 192.168.1.11:6443 max_fails=3 fail_timeout=5s;
server 192.168.1.12:6443 max_fails=3 fail_timeout=5s;
}
server {
listen 6443;
proxy_pass k3s_api;
proxy_timeout 300s;
proxy_connect_timeout 1s;
}
}
EOF
systemctl restart nginx
systemctl enable nginx
echo "✓ Nginx TCP负载均衡配置完成"
# 配置Keepalived(VRRP实现VIP漂移)
# Master-1优先级100, Master-2优先级90, Master-3优先级80
PRIORITY=$((110 - MASTER_INDEX * 10))
STATE="BACKUP"
if [ "$MASTER_INDEX" -eq 1 ]; then
STATE="MASTER"
fi
cat > /etc/keepalived/keepalived.conf <<EOF
global_defs {
router_id K3S_API_LB_$MASTER_INDEX
script_user root
enable_script_security
}
vrrp_script check_nginx {
script "/usr/bin/systemctl is-active --quiet nginx"
interval 2
weight -20
fall 2
rise 2
}
vrrp_instance VI_1 {
state $STATE
interface $(ip route | grep default | awk '{print $5}' | head -n1)
virtual_router_id 51
priority $PRIORITY
advert_int 1
authentication {
auth_type PASS
auth_pass K3sCluster2024
}
virtual_ipaddress {
$VIP/24
}
track_script {
check_nginx
}
}
EOF
systemctl restart keepalived
systemctl enable keepalived
echo "✓ Keepalived配置完成 (优先级: $PRIORITY, 状态: $STATE)"
# 等待VIP生效
sleep 3
ip addr | grep $VIP && echo "✓ VIP $VIP 已绑定" || echo "⚠ VIP未绑定,检查Keepalived状态"
echo ""
echo "========== Master-$MASTER_INDEX 负载均衡部署完成 =========="
执行方式:
bash
# Master-1 (优先级100,MASTER状态)
./02-setup-master-lb.sh 192.168.1.10 1
# Master-2 (优先级90,BACKUP状态)
./02-setup-master-lb.sh 192.168.1.11 2
# Master-3 (优先级80,BACKUP状态)
./02-setup-master-lb.sh 192.168.1.12 3
3. K3s Server部署脚本(Master节点执行)
bash
#!/bin/bash
# ============================================
# 文件名: 03-install-k3s-server.sh
# 用途: 部署K3s Server(高可用模式)
# 执行: Master-1先执行,Master-2/3使用相同token加入
# ============================================
set -e
MASTER_IP=$1
MASTER_INDEX=$2
VIP="192.168.1.9"
if [ -z "$MASTER_IP" ] || [ -z "$MASTER_INDEX" ]; then
echo "用法: ./03-install-k3s-server.sh <本机IP> <Master序号1/2/3>"
exit 1
fi
echo "========== 部署K3s Server-$MASTER_INDEX ($MASTER_IP) =========="
# 安装参数
INSTALL_K3S_EXEC="server \
--cluster-init \
--tls-san $VIP \
--tls-san k3s-api \
--tls-san $MASTER_IP \
--node-ip $MASTER_IP \
--advertise-address $MASTER_IP \
--flannel-iface $(ip route | grep default | awk '{print $5}' | head -n1) \
--disable servicelb \
--disable traefik"
if [ "$MASTER_INDEX" -eq 1 ]; then
# 第一个Master:初始化集群
echo ">> 初始化K3s集群..."
curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn INSTALL_K3S_EXEC="$INSTALL_K3S_EXEC" sh -
# 保存token
TOKEN=$(cat /var/lib/rancher/k3s/server/node-token)
echo "$TOKEN" > /root/k3s-node-token
echo ""
echo "========================================"
echo "⚠ 请保存以下Token,Worker节点需要用到:"
echo "$TOKEN"
echo "========================================"
echo ""
else
# Master-2/3:加入已有集群
read -p "请输入Master-1的Node Token: " TOKEN
echo ">> 加入已有K3s集群..."
curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn INSTALL_K3S_EXEC="$INSTALL_K3S_EXEC" K3S_TOKEN="$TOKEN" sh -s - server --server https://$VIP:6443
fi
# 配置kubectl
mkdir -p ~/.kube
cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
chmod 600 ~/.kube/config
export KUBECONFIG=~/.kube/config
# 等待节点就绪
echo ">> 等待K3s就绪..."
sleep 10
until kubectl get nodes | grep -q "Ready"; do
echo " 等待中..."
sleep 5
done
echo "✓ K3s Server-$MASTER_INDEX 部署完成"
kubectl get nodes -o wide
echo ""
echo "========== K3s Server-$MASTER_INDEX 完成 =========="
执行方式:
bash
# Master-1(初始化集群)
./03-install-k3s-server.sh 192.168.1.10 1
# 保存输出的token,然后执行Master-2/3
./03-install-k3s-server.sh 192.168.1.11 2
# 输入token
./03-install-k3s-server.sh 192.168.1.12 3
# 输入token
4. K3s Agent部署脚本(Worker节点执行)
bash
#!/bin/bash
# ============================================
# 文件名: 04-install-k3s-agent.sh
# 用途: Worker节点加入K3s集群
# 执行: 在Worker-1和Worker-2上执行
# ============================================
set -e
WORKER_IP=$1
VIP="192.168.1.9"
if [ -z "$WORKER_IP" ]; then
echo "用法: ./04-install-k3s-agent.sh <本机IP>"
exit 1
fi
read -p "请输入K3s Node Token: " TOKEN
echo "========== 部署K3s Agent ($WORKER_IP) =========="
curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | INSTALL_K3S_MIRROR=cn K3S_URL="https://$VIP:6443" K3S_TOKEN="$TOKEN" sh -
echo "✓ K3s Agent部署完成"
# 在Master上验证(需要在Master执行)
echo ""
echo "请在任意Master节点执行以下命令验证:"
echo " kubectl get nodes -o wide"
执行方式:
bash
# Worker-1
./04-install-k3s-agent.sh 192.168.1.13
# 输入token
# Worker-2
./04-install-k3s-agent.sh 192.168.1.14
# 输入token
5. 一键部署核心组件脚本(在Master-1执行)
bash
#!/bin/bash
# ============================================
# 文件名: 05-deploy-core-components.sh
# 用途: 一键部署集群核心组件
# 执行: 在Master-1上执行
# ============================================
set -e
echo "========== 开始部署集群核心组件 =========="
# 配置kubectl
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
# 1. 部署MetalLB(负载均衡器,替代K3s默认的ServiceLB)
echo ">> 1. 部署MetalLB..."
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.12/config/manifests/metallb-native.yaml
# 等待MetalLB就绪
sleep 30
kubectl wait --namespace metallb-system \
--for=condition=ready pod \
--selector=app=metallb \
--timeout=90s
# 配置MetalLB IP池(根据实际网络修改)
cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: default-pool
namespace: metallb-system
spec:
addresses:
- 192.168.1.100-192.168.1.150
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: default
namespace: metallb-system
EOF
echo "✓ MetalLB部署完成,IP池: 192.168.1.100-150"
# 2. 部署Traefik Ingress(K3s默认已安装,这里升级配置)
echo ">> 2. 配置Traefik Ingress..."
kubectl apply -f - <<EOF
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: traefik
namespace: kube-system
spec:
valuesContent: |-
service:
type: LoadBalancer
ports:
web:
exposedPort: 80
websecure:
exposedPort: 443
additionalArguments:
- "--api.insecure=true"
- "--ping=true"
- "--metrics.prometheus=true"
EOF
echo "✓ Traefik配置完成"
# 3. 部署Longhorn分布式存储
echo ">> 3. 部署Longhorn..."
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/deploy/longhorn.yaml
echo ">> 等待Longhorn就绪(约2-3分钟)..."
kubectl wait --namespace longhorn-system \
--for=condition=ready pod \
--selector=app=longhorn-manager \
--timeout=300s
# 设置默认StorageClass
kubectl patch storageclass longhorn -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
echo "✓ Longhorn部署完成,已设为默认存储类"
# 4. 部署监控栈(Prometheus + Grafana)
echo ">> 4. 部署监控栈..."
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring 2>/dev/null || true
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set grafana.adminPassword='Admin@123' \
--set prometheus.prometheusSpec.retention=15d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=longhorn \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=20Gi \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
--wait
# 暴露Grafana和Prometheus
kubectl patch svc monitoring-grafana -n monitoring -p '{"spec":{"type":"LoadBalancer"}}'
kubectl patch svc monitoring-kube-prometheus-prometheus -n monitoring -p '{"spec":{"type":"LoadBalancer"}}'
echo "✓ 监控栈部署完成"
echo " Grafana账号: admin / Admin@123"
# 5. 部署示例应用(Nginx)验证集群
echo ">> 5. 部署验证应用..."
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-demo
spec:
replicas: 5
selector:
matchLabels:
app: nginx-demo
template:
metadata:
labels:
app: nginx-demo
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "50m"
---
apiVersion: v1
kind: Service
metadata:
name: nginx-demo
spec:
selector:
app: nginx-demo
ports:
- port: 80
targetPort: 80
type: LoadBalancer
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nginx-demo
spec:
rules:
- host: demo.cluster.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx-demo
port:
number: 80
EOF
echo "✓ 验证应用部署完成(5副本Nginx)"
# 6. 获取访问信息
echo ""
echo "========== 部署完成!访问信息 =========="
echo ""
GRAFANA_IP=$(kubectl get svc monitoring-grafana -n monitoring -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2>/dev/null || echo "等待分配")
PROMETHEUS_IP=$(kubectl get svc monitoring-kube-prometheus-prometheus -n monitoring -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2>/dev/null || echo "等待分配")
NGINX_IP=$(kubectl get svc nginx-demo -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2>/dev/null || echo "等待分配")
echo "🌐 Grafana面板: http://$GRAFANA_IP (admin/Admin@123)"
echo "📊 Prometheus: http://$PROMETHEUS_IP:9090"
echo "🚀 Nginx演示应用: http://$NGINX_IP"
echo "📦 Longhorn存储UI: kubectl port-forward svc/longhorn-frontend -n longhorn-system 8080:80"
echo ""
echo "📋 集群状态:"
kubectl get nodes -o wide
echo ""
echo "📋 系统Pod状态:"
kubectl get pods -n kube-system
echo ""
echo "📋 监控Pod状态:"
kubectl get pods -n monitoring
echo ""
echo "📋 演示应用状态:"
kubectl get pods,svc,ingress
echo ""
echo "========== 所有组件部署完成 =========="
执行方式:
bash
# 在Master-1执行
chmod +x 05-deploy-core-components.sh
./05-deploy-core-components.sh
第三步:总执行流程
bash
# ========== 第1步:5台服务器分别执行初始化 ==========
# node1 (192.168.1.10)
ssh root@192.168.1.10 './01-init-all-nodes.sh 192.168.1.10 k3s-master1'
# node2 (192.168.1.11)
ssh root@192.168.1.11 './01-init-all-nodes.sh 192.168.1.11 k3s-master2'
# node3 (192.168.1.12)
ssh root@192.168.1.12 './01-init-all-nodes.sh 192.168.1.12 k3s-master3'
# node4 (192.168.1.13)
ssh root@192.168.1.13 './01-init-all-nodes.sh 192.168.1.13 k3s-worker1'
# node5 (192.168.1.14)
ssh root@192.168.1.14 './01-init-all-nodes.sh 192.168.1.14 k3s-worker2'
# ========== 第2步:3个Master部署负载均衡 ==========
ssh root@192.168.1.10 './02-setup-master-lb.sh 192.168.1.10 1'
ssh root@192.168.1.11 './02-setup-master-lb.sh 192.168.1.11 2'
ssh root@192.168.1.12 './02-setup-master-lb.sh 192.168.1.12 3'
# ========== 第3步:部署K3s Server ==========
ssh root@192.168.1.10 './03-install-k3s-server.sh 192.168.1.10 1'
# 保存输出的token
ssh root@192.168.1.11 './03-install-k3s-server.sh 192.168.1.11 2'
# 输入token
ssh root@192.168.1.12 './03-install-k3s-server.sh 192.168.1.12 3'
# 输入token
# ========== 第4步:Worker加入集群 ==========
ssh root@192.168.1.13 './04-install-k3s-agent.sh 192.168.1.13'
# 输入token
ssh root@192.168.1.14 './04-install-k3s-agent.sh 192.168.1.14'
# 输入token
# ========== 第5步:部署核心组件 ==========
ssh root@192.168.1.10 './05-deploy-core-components.sh'
# ========== 完成!访问Grafana查看集群状态 ==========
第四步:验证集群状态
执行完成后,您将看到:
📋 集群状态:
NAME STATUS ROLES AGE VERSION
k3s-master1 Ready control-plane,etcd,master 10m v1.28.x
k3s-master2 Ready control-plane,etcd,master 8m v1.28.x
k3s-master3 Ready control-plane,etcd,master 6m v1.28.x
k3s-worker1 Ready <none> 4m v1.28.x
k3s-worker2 Ready <none> 4m v1.28.x
🌐 Grafana面板: http://192.168.1.100 (admin/Admin@123)
📊 Prometheus: http://192.168.1.101:9090
🚀 Nginx演示应用: http://192.168.1.102
打开Grafana后,您将看到:
- 集群概览面板:5节点CPU/内存/磁盘实时状态
- Pod分布面板:应用跨节点调度情况
- 网络面板:集群网络流量
- 告警面板:预设的告警规则状态
技术栈总结(搭建完成后分析)
| 层级 | 使用技术 | 作用 |
|---|---|---|
| 负载均衡入口 | Nginx + Keepalived + VIP | Master高可用,故障自动漂移 |
| 容器编排 | K3s (轻量K8s) | 集群管理、调度、自愈 |
| 数据库 | etcd (K3s内置) | 集群状态存储,3节点高可用 |
| 网络 | Flannel (K3s内置) | Pod跨节点通信 |
| 负载均衡 | MetalLB | 为Service分配外部IP |
| Ingress | Traefik | HTTP/HTTPS路由、SSL终止 |
| 存储 | Longhorn | 分布式块存储,数据多副本 |
| 监控 | Prometheus + Grafana | 指标采集、可视化、告警 |
| 应用部署 | Deployment + Service + Ingress | 声明式应用管理 |
可优化方向(后续迭代)
| 优化点 | 当前方案 | 优化后 |
|---|---|---|
| Master高可用 | Nginx+Keepalived | 云厂商SLB或专用硬件LB |
| 存储性能 | Longhorn (复制存储) | Rook-Ceph 或 本地SSD + 快照 |
| 数据库 | 无 | 部署MySQL/PostgreSQL Operator |
| CI/CD | 手动部署 | GitLab CI / ArgoCD |
| 安全 | 关闭防火墙 | 精细iptables + NetworkPolicy |
| 日志 | 无集中收集 | Loki + Grafana统一日志 |
| 备份 | 无 | Velero集群备份 |
| 多集群 | 单集群 | Rancher多集群管理 |
这套脚本执行完毕后,您将拥有一个完整的、高可用的、带监控面板的5节点K8s集群。所有脚本都可以直接复制使用,只需修改IP地址即可。