搭建 DevOps 企业级仿真实验环境:013全节点统一基础配置与 kubeconfig 批量分发

前置确认 :所有 9 个节点的 SSH 免密登录已全部配置完成,后续所有操作均在ControlNodeA([192.168.0.151](192.168.0.151)) 上远程批量执行,无需手动登录任何其他节点。

本文将基于已完成的 SSH 免密环境,一次性完成8 个未配置节点的系统基础初始化 + containerd v2.2.1 统一安装,再进行 kubeconfig 权限分级配置与批量分发,确保所有节点满足 Kubernetes 集群加入的强制前置条件。

一、实验环境与节点配置分工

本实验基于 Proxmox VE 虚拟化平台搭建,所有节点均为 Ubuntu 22.04 LTS 操作系统,节点规划与配置要求如下:

|-------------------|----------------------------------|--------------|-------------------------------------|
| 主机名 | IP 地址 | 角色 | 需完成配置项 |
| ControlNodeA | [192.168.0.151](192.168.0.151) | 控制平面主节点(已完成) | 仅需执行批量配置脚本 |
| ControlNodeB | [192.168.0.152](192.168.0.152) | 控制平面副本节点 | 系统初始化 + containerd+admin kubeconfig |
| ControlNodeC | [192.168.0.153](192.168.0.153) | 控制平面副本节点 | 系统初始化 + containerd+admin kubeconfig |
| WorkNodeA | [192.168.0.154](192.168.0.154) | 业务工作节点 | 系统初始化 + containerd |
| WorkNodeB | [192.168.0.155](192.168.0.155) | 业务工作节点 | 系统初始化 + containerd |
| DataMidNode | [192.168.0.156](192.168.0.156) | 中间件 / 数据存储节点 | 系统初始化 + containerd |
| DevOpsToolNode | [192.168.0.157](192.168.0.157) | 统一运维跳板机 | 系统初始化 + kubectl+ops kubeconfig |
| ObservabilityNode | [192.168.0.158](192.168.0.158) | 可观测性节点 | 系统初始化 + containerd |
| DSDRNode | [192.168.0.159](192.168.0.159) | 灾备 / 数据持久化节点 | 系统初始化 + containerd |

|-------------------------------------------------------------------------------------|
| 核心原则 :所有节点的系统参数、内核模块、containerd 版本与配置必须 100% 一致 ,这是 Kubernetes 集群稳定运行的根本保障。 |

二、全节点统一系统基础配置

所有节点必须完成以下 5 项系统配置,缺一不可:

  1. 更新软件源并安装基础依赖
  1. 永久关闭 Swap 分区(K8s 强制要求)
  1. 加载 overlay 和 br_netfilter 内核模块
  1. 开启桥接网络 iptables 转发
  1. 设置正确的主机名并绑定 hosts

2.1 创建全节点清单文件

首先在 ControlNodeA 上创建包含所有节点信息的清单文件,用于后续批量操作:

|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash sudo tee /opt/cluster/all_nodes.txt <<EOF 192.168.0.151 ControlNodeA control 192.168.0.152 ControlNodeB control 192.168.0.153 ControlNodeC control 192.168.0.154 WorkNodeA worker 192.168.0.155 WorkNodeB worker 192.168.0.156 DataMidNode worker 192.168.0.157 DevOpsToolNode ops 192.168.0.158 ObservabilityNode worker 192.168.0.159 DSDRNode worker EOF |

2.2 编写系统初始化自动化脚本

创建init_node_system.sh脚本,包含所有系统配置步骤,采用幂等设计,重复执行不会出错:

|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash #!/bin/bash set -euo pipefail # 接收主机名参数 NODE_HOSTNAME=1 echo "=====================================" echo "开始初始化节点: {NODE_HOSTNAME}" echo "=====================================" # 1. 更新软件源并安装基础工具 echo "1. 更新软件源并安装依赖..." apt update -y apt install -y ca-certificates curl gnupg lsb-release bash-completion # 2. 永久关闭Swap分区 echo "2. 永久关闭Swap..." swapoff -a sed -i '/ swap / s/^\(.*\)/#\\1/g' /etc/fstab # 3. 加载K8s必需的内核模块 echo "3. 加载内核模块..." cat \<\{NODE_HOSTNAME} # 避免重复添加 grep -q "127.0.0.1 {NODE_HOSTNAME}" /etc/hosts \|\| echo "127.0.0.1 {NODE_HOSTNAME}" >> /etc/hosts # 6. 验证配置结果 echo "=====================================" echo "节点初始化完成,验证结果:" echo "=====================================" echo "主机名: (hostname)" echo "Swap已关闭: (free -h | grep Swap | awk '{print 2}')" echo "内核模块加载: (lsmod | grep -E 'overlay|br_netfilter' | wc -l) 个(预期2个)" echo "IP转发已开启: (sysctl -n net.ipv4.ip_forward)(预期1)" echo "=====================================" echo "节点 {NODE_HOSTNAME} 系统初始化成功!" echo "=====================================" |

2.3 批量执行系统初始化

给脚本添加执行权限后,批量在所有未配置节点上执行:

|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash # 给脚本添加执行权限 sudo chmod +x init_node_system.sh # 批量执行系统初始化(跳过已完成的ControlNodeA) while read ip hostname role; do if [ "{hostname}" != "ControlNodeA" \]; then echo "=====================================" echo "正在远程初始化: {hostname} ({ip})" echo "=====================================" # 复制脚本到目标节点 scp init_node_system.sh jack@{ip}:/home/jack/ # 远程执行脚本并传入主机名参数 # 关键:加 -n 防止 ssh 吃掉 while read 的输入 ssh -n jack@{ip} "sudo bash /home/jack/init_node_system.sh {hostname}" echo "节点 ${hostname} 初始化完成!" echo "" fi done < all_nodes.txt |

2.4 批量验证系统配置

创建verify_cluster.sh脚本,执行以下命令一次性验证所有 8 个节点的系统配置是否正确:

|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash #!/bin/bash # 颜色定义 RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' NC='\033[0m' echo "=====================================" echo "全节点系统配置批量验证" echo "=====================================" if [ ! -f all_nodes.txt ]; then echo -e "{RED}错误: all_nodes.txt 文件不存在{NC}" exit 1 fi total=0 success=0 failed=0 while read -r ip hostname role; do # 跳过空行和注释行 [[ -z "ip" \|\| "ip" =~ ^# ]] && continue if [ "{hostname}" != "ControlNodeA" \]; then total=((total + 1)) echo "-------------------------------------" echo -e "{YELLOW}节点: {hostname} ({ip}){NC}" echo "-------------------------------------" # 先测试连通性 if ping -c 1 -W 2 {ip} \> /dev/null 2\>\&1; then # 使用 jack 账号连接,关键命令通过 sudo 执行 if ssh -n -o ConnectTimeout=5 -o StrictHostKeyChecking=no jack@{ip} bash << 'EOF' 2>/dev/null; then echo "主机名: (hostname)" echo "Swap状态: " free -h \| grep Swap echo "内核模块: " sudo lsmod \| grep -E 'overlay\|br_netfilter' 2\>/dev/null \|\| echo " 需要sudo权限查看内核模块" echo "网络参数: " sudo sysctl net.bridge.bridge-nf-call-iptables net.ipv4.ip_forward 2\>/dev/null \|\| echo " 需要sudo权限查看网络参数" EOF success=((success + 1)) echo -e "{GREEN}✓ {hostname} 验证完成{NC}" else failed=((failed + 1)) echo -e "{RED}✗ {hostname} SSH连接失败{NC}" fi else failed=((failed + 1)) echo -e "{RED}✗ {hostname} 无法ping通{NC}" fi echo "" fi done \< all_nodes.txt echo "=====================================" echo -e "验证结果: 总计 {total} | {GREEN}成功 {success}{NC} \| {RED}失败 {failed}{NC}" echo "=====================================" # 给脚本添加执行权限 sudo chmod +x verify_cluster.sh # 执行脚本 ./verify_cluster.sh |

三、全节点统一安装配置 containerd v2.2.1

所有节点必须安装相同版本的 containerd,并完成以下关键配置:

  1. 启用 SystemdCgroup 驱动(与 K8s cgroup 管理器一致)
  1. 替换 pause 镜像为阿里云镜像(解决国内下载超时)
  1. 配置开机自启并验证服务状态
  1. 安装 crictl 工具用于本地容器调试

3.1 编写 containerd 安装配置脚本

创建install_containerd.sh脚本,采用国内镜像加速,确保安装速度:

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash #!/bin/bash set -euo pipefail CONTAINERD_VERSION="2.2.1" PAUSE_IMAGE="registry.aliyuncs.com/google_containers/pause:3.9" CRICTL_VERSION="v1.30.0" echo "=====================================" echo "开始安装containerd v{CONTAINERD_VERSION}" echo "=====================================" # 1. 安装containerd echo "1. 安装containerd..." sudo apt update -y sudo apt install -y containerd # 2. 生成并修改默认配置 echo "2. 生成并优化配置文件..." sudo mkdir -p /etc/containerd containerd config default \| sudo tee /etc/containerd/config.toml # 启用SystemdCgroup驱动(必须) sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml # 替换pause镜像为阿里云镜像 sudo sed -i "s\|sandbox = '.\*'\|sandbox = '{PAUSE_IMAGE}'|" /etc/containerd/config.toml # 3. 重启服务并设置开机自启 echo "3. 启动containerd服务..." sudo systemctl daemon-reload sudo systemctl enable --now containerd sudo systemctl restart containerd # 4. 安装crictl工具 echo "4. 安装crictl {CRICTL_VERSION}..." sudo wget -q https://ghproxy.net/https://github.com/kubernetes-sigs/cri-tools/releases/download/{CRICTL_VERSION}/crictl-{CRICTL_VERSION}-linux-amd64.tar.gz -O crictl.tar.gz sudo tar zxvf crictl.tar.gz -C /usr/local/bin/ rm -f crictl.tar.gz # 5. 配置crictl连接containerd echo "5. 配置crictl连接..." cat \<\(containerd --version | awk '{print 3}')" echo "服务状态: (systemctl is-active containerd)" echo "SystemdCgroup: (grep SystemdCgroup /etc/containerd/config.toml \| awk '{print 3}')" echo "Pause镜像: (grep sandbox /etc/containerd/config.toml \| awk -F\\" '{print 2}')" echo "crictl连接: $(crictl info > /dev/null && echo '成功' || echo '失败')" echo "=====================================" echo "containerd安装配置成功!" echo "=====================================" # 给脚本添加执行权限 sudo chmod +x install_containerd.sh |

3.2 批量安装配置 containerd
批量安装配置 containerd 前,先手动配置单个节点 sudo 免密,对每个需要配置的节点执行:

|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash # 以 ControlNodeB 为例,其它节点以此类推 ssh jack@192.168.0.152 # 在远程节点上执行 echo 'jack ALL=(ALL) NOPASSWD: ALL' | sudo tee /etc/sudoers.d/jack-nopasswd sudo chmod 440 /etc/sudoers.d/jack-nopasswd # 验证 sudo -n true && echo "配置成功" || echo "配置失败" # 退出 exit |

批量安装配置 containerd

|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash # 批量安装containerd(跳过ControlNodeA) #!/bin/bash # 颜色定义 RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' BLUE='\033[0;34m' NC='\033[0m' # 安装脚本路径 INSTALL_SCRIPT="install_containerd.sh" # 日志目录(使用绝对路径避免切换目录问题) LOG_DIR="{HOME}/install_log" echo "=====================================" echo -e "{BLUE}批量安装 containerd{NC}" echo "=====================================" # 检查安装脚本是否存在 if \[ ! -f "{INSTALL_SCRIPT}" ]; then echo -e "{RED}错误: {INSTALL_SCRIPT} 文件不存在{NC}" exit 1 fi # 检查节点列表 if \[ ! -f all_nodes.txt \]; then echo -e "{RED}错误: all_nodes.txt 文件不存在{NC}" exit 1 fi # 创建日志目录(强制创建,包括父目录) mkdir -p "{LOG_DIR}" || { echo -e "{RED}错误: 无法创建日志目录 {LOG_DIR}{NC}" exit 1 } # 验证日志目录可写 if \[ ! -w "{LOG_DIR}" ]; then echo -e "{RED}错误: 日志目录 {LOG_DIR} 不可写{NC}" exit 1 fi # 统计变量 total=0 success=0 failed=0 declare -a failed_nodes # 记录开始时间 start_time=(date +%s) while read -r ip hostname role; do # 跳过空行和注释 [[ -z "ip" \|\| "ip" =~ ^# ]] && continue if [ "{hostname}" != "ControlNodeA" \]; then total=((total + 1)) # 生成日志文件名 log_file="{LOG_DIR}/{hostname}_(date +%Y%m%d_%H%M%S).log" # 预先创建日志文件 touch "{log_file}" || { echo -e "{RED}✗ 无法创建日志文件: {log_file}{NC}" failed=((failed + 1)) failed_nodes+=("{hostname} (日志文件创建失败)") continue } echo "=====================================" echo -e "{YELLOW}[{total}\] 正在安装 containerd 到: {hostname} ({ip}){NC}" echo "=====================================" # 1. 测试连通性 if ! ping -c 1 -W 2 {ip} \> /dev/null 2\>\&1; then echo -e "{RED}✗ {hostname} 无法 ping 通{NC}" | tee -a "{log_file}" failed=((failed + 1)) failed_nodes+=("{hostname} (网络不通)") continue fi # 2. 复制安装脚本 echo -e "{BLUE}[1/2] 复制安装脚本...{NC}" \| tee -a "{log_file}" # 确保远程主机上存在 /home/jack 目录(静默执行) ssh -n -o ConnectTimeout=5 jack@{ip} "mkdir -p /home/jack" \>\> "{log_file}" 2>&1 if scp -o ConnectTimeout=5 "{INSTALL_SCRIPT}" jack@{ip}:/home/jack/ >> "{log_file}" 2\>\&1; then echo -e "{GREEN}✓ 脚本复制成功{NC}" \| tee -a "{log_file}" else echo -e "{RED}✗ 脚本复制失败{NC}" | tee -a "{log_file}" failed=((failed + 1)) failed_nodes+=("{hostname} (脚本复制失败)") continue fi # 3. 远程执行安装(关键修改:使用 tee 同时显示和记录) echo -e "{BLUE}[2/2] 执行安装脚本...{NC}" \| tee -a "{log_file}" echo -e "{BLUE}-------------------------------------{NC}" | tee -a "{log_file}" # 使用进程替换捕获退出码,同时用 tee 输出 exec 3\>\&1 # 保存标准输出 if ssh -n -o ConnectTimeout=10 jack@{ip} "sudo bash /home/jack/install_containerd.sh" 2>&1 | tee -a "{log_file}"; then echo -e "{BLUE}-------------------------------------{NC}" \| tee -a "{log_file}" success=((success + 1)) echo -e "{GREEN}✓ {hostname} containerd 安装成功{NC}" | tee -a "{log_file}" else echo -e "{BLUE}-------------------------------------{NC}" \| tee -a "{log_file}" failed=((failed + 1)) failed_nodes+=("{hostname} (安装失败,查看日志: {log_file})") echo -e "{RED}✗ {hostname} containerd 安装失败{NC}" | tee -a "{log_file}" echo -e "{YELLOW} 详细日志: {log_file}{NC}" fi echo "" fi done < all_nodes.txt # 计算耗时 end_time=(date +%s) duration=((end_time - start_time)) # 输出汇总 echo "=====================================" echo -e "{BLUE}安装结果汇总{NC}" echo "=====================================" echo -e "总节点数: {total}" echo -e "{GREEN}成功: {success}{NC}" echo -e "{RED}失败: {failed}{NC}" echo -e "耗时: {duration} 秒" echo "" # 显示失败节点 if [ {#failed_nodes\[@\]} -gt 0 \]; then echo -e "{RED}失败节点列表:{NC}" for node in "{failed_nodes[@]}"; do echo -e " {RED}✗{NC} {node}" done echo "" echo -e "{YELLOW}提示: 查看详细日志获取更多信息{NC}" echo -e "日志目录: {LOG_DIR}" fi echo "=====================================" # 给脚本添加执行权限 sudo chmod +x batch_install_containerd.sh # 执行脚本 ./batch_install_containerd.sh |

3.3 批量验证 containerd 配置

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash #!/bin/bash # 颜色定义 RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' BLUE='\033[0;34m' NC='\033[0m' echo "=====================================" echo -e "{BLUE}全节点 containerd 配置批量验证{NC}" echo "=====================================" total=0 success=0 failed=0 while read -r ip hostname role; do [[ -z "ip" \|\| "ip" =~ ^# ]] && continue [ "{hostname}" == "ControlNodeA" \] \&\& continue total=((total + 1)) echo "-------------------------------------" echo -e "{YELLOW}节点: {hostname} ({ip}){NC}" echo "-------------------------------------" # 测试连通性 if ! ping -c 1 -W 2 {ip} \> /dev/null 2\>\&1; then echo -e "{RED}✗ 无法 ping 通{NC}" failed=((failed + 1)) echo "" continue fi # 使用 jack 连接执行验证 ssh -n -o ConnectTimeout=5 jack@{ip} bash \<\< 'EOF' 2\>/dev/null echo "containerd版本: " containerd --version 2\>/dev/null \|\| echo " 未安装" echo "服务状态: " sudo systemctl is-active containerd 2\>/dev/null \|\| echo " 服务未运行" sudo systemctl is-enabled containerd 2\>/dev/null \|\| echo " 未设置开机自启" echo "关键配置: " sudo grep -E 'SystemdCgroup\|sandbox' /etc/containerd/config.toml 2\>/dev/null \|\| echo " 配置文件不存在或无权限" echo "crictl连接: " if command -v crictl \> /dev/null 2\>\&1; then sudo crictl info \> /dev/null 2\>\&1 \&\& echo " 连接成功" \|\| echo " 连接失败" else echo " crictl 未安装" fi EOF if \[ ? -eq 0 ]; then echo -e "{GREEN}✓ {hostname} 验证完成{NC}" success=((success + 1)) else echo -e "{RED}✗ {hostname} 连接失败{NC}" failed=((failed + 1)) fi echo "" done < all_nodes.txt echo "=====================================" echo -e "{BLUE}验证结果汇总{NC}" echo "=====================================" echo -e "总计: {total} \| {GREEN}成功: {success}{NC} | {RED}失败: {failed}${NC}" echo "=====================================" # 给脚本添加执行权限 sudo chmod +x verify_containerd.sh # 执行脚本 ./verify_containerd.sh |

四、kubeconfig 权限分级配置与批量分发

完成所有节点基础配置后,进行 kubeconfig 的权限配置与分发。仅在控制平面节点和运维跳板机上安装 kubectl,工作节点不安装任何集群管控工具。

4.1 在 ControlNodeA 生成多角色 kubeconfig

步骤 1:备份原始管理员配置

|------------------------------------------------------------------------------------------------|
| bash # 备份原始admin.conf,防止操作失误 sudo cp /etc/kubernetes/admin.conf /etc/kubernetes/admin.conf.bak |

步骤 2:生成集群运维角色(cluster-ops)kubeconfig

创建专门用于日常运维的 ServiceAccount,避免直接使用超级管理员权限:

|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash #!/bin/bash echo "=====================================" echo "创建 cluster-ops 账号并生成 kubeconfig" echo "=====================================" # 1. 创建 ServiceAccount echo "1. 创建 ServiceAccount..." kubectl create serviceaccount cluster-ops -n kube-system --dry-run=client -o yaml | kubectl apply -f - # 2. 创建 ClusterRoleBinding echo "2. 创建 ClusterRoleBinding..." kubectl create clusterrolebinding cluster-ops-admin \ --clusterrole=cluster-admin \ --serviceaccount=kube-system:cluster-ops \ --dry-run=client -o yaml | kubectl apply -f - # 3. 创建 Token Secret(使用 tee 方式) echo "3. 创建 Token Secret..." cat <<EOF | tee /tmp/cluster-ops-secret.yaml apiVersion: v1 kind: Secret metadata: name: cluster-ops-token namespace: kube-system annotations: kubernetes.io/service-account.name: cluster-ops type: kubernetes.io/service-account-token EOF kubectl apply -f /tmp/cluster-ops-secret.yaml rm -f /tmp/cluster-ops-secret.yaml # 4. 等待 Token 生成 echo "4. 等待 Token 生成..." for i in {1..10}; do OPS_TOKEN=(kubectl get secret cluster-ops-token -n kube-system -o jsonpath='{.data.token}' 2\>/dev/null \| base64 --decode) if \[ -n "OPS_TOKEN" ]; then echo "✓ Token 获取成功(第 {i} 次尝试)" break fi echo " 等待中... ({i}/10)" sleep 3 done if [ -z "OPS_TOKEN" \]; then echo "错误: 无法获取 Token,请检查 Secret 状态" kubectl describe secret cluster-ops-token -n kube-system exit 1 fi # 5. 获取 CA 证书 echo "5. 获取 CA 证书..." CA_CERT=(kubectl config view --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}') if [ -z "CA_CERT" \]; then echo "错误: 无法获取 CA 证书" exit 1 fi echo "✓ CA 证书获取成功" # 6. 获取 API Server 地址 APISERVER=(kubectl config view --raw -o jsonpath='{.clusters[0].cluster.server}') echo "✓ API Server: {APISERVER}" # 7. 生成 kubeconfig 文件 echo "6. 生成 kubeconfig 文件..." cat \<\ /dev/null apiVersion: v1 kind: Config clusters: - cluster: certificate-authority-data: {CA_CERT} server: {APISERVER} name: k8s-cluster contexts: - context: cluster: k8s-cluster user: cluster-ops name: cluster-ops@k8s-cluster current-context: cluster-ops@k8s-cluster users: - name: cluster-ops user: token: {OPS_TOKEN} EOF echo "✓ kubeconfig 文件已生成: /home/jack/cluster-ops.kubeconfig" # 8. 验证 echo "" echo "=====================================" echo "验证 cluster-ops 账号" echo "=====================================" echo "集群信息:" kubectl --kubeconfig=/home/jack/cluster-ops.kubeconfig cluster-info 2>&1 | head -3 echo "" echo "节点列表:" kubectl --kubeconfig=/home/jack/cluster-ops.kubeconfig get nodes echo "" echo "=====================================" echo "完成!" echo "=====================================" echo "kubeconfig 文件: /home/jack/cluster-ops.kubeconfig" echo "" echo "使用方式:" echo " kubectl --kubeconfig=/home/jack/cluster-ops.kubeconfig get nodes" echo " 或" echo " export KUBECONFIG=/home/jack/cluster-ops.kubeconfig" echo "=====================================" # 设置严格的文件权限(必须为600) sudo chmod 600 /home/jack/cluster-ops.kubeconfig |

步骤 3:准备管理员 kubeconfig

控制平面节点需要完整的集群管理权限,直接使用原始的 admin.conf:

|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash # 复制admin.conf到当前目录 sudo cp /etc/kubernetes/admin.conf admin.kubeconfig sudo chown (id -u):(id -g) admin.kubeconfig chmod 600 admin.kubeconfig |

4.2 编写 kubectl 与 kubeconfig 批量分发脚本

强制关闭并删除 Swap

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash #!/bin/bash # force_disable_swap_v2.sh echo "=====================================" echo "强制关闭并禁用所有节点 Swap" echo "=====================================" while read -r ip hostname role; do [[ -z "ip" \|\| "ip" =~ ^# ]] && continue [ "{hostname}" == "ControlNodeA" \] \&\& continue echo "-------------------------------------" echo "处理: {hostname} ({ip})" ssh -t jack@{ip} bash << 'EOF' echo "=== 关闭前 ===" free | grep Swap sudo swapon --show echo "" echo "=== 关闭 Swap ===" sudo swapoff -a -v echo "" echo "=== 注释 fstab ===" sudo cp /etc/fstab /etc/fstab.bak sudo sed -i '/swap/s/^/# /' /etc/fstab echo "fstab 内容:" grep swap /etc/fstab echo "" echo "=== 删除 swap 文件 ===" sudo rm -f /swap.img /swapfile echo "" echo "=== 关闭后 ===" free | grep Swap sudo swapon --show 2>&1 || echo "无 swap 设备" EOF echo "" echo "✓ {hostname} 处理完成" echo "" done \< all_nodes.txt echo "=====================================" echo "完成!验证结果:" echo "=====================================" # 验证所有节点 while read -r ip hostname role; do \[\[ -z "ip" || "ip" =\~ \^# \]\] \&\& continue \[ "{hostname}" == "ControlNodeA" ] && continue result=(ssh -n jack@{ip} "free | grep Swap | awk '{print \2}'") if \[ "result" -eq 0 ]; then echo "✓ {hostname}: Swap 已完全关闭" else echo "✗ {hostname}: Swap 仍有 ${result} KB" fi done < all_nodes.txt # 运行修复脚本 chmod +x force_disable_swap_v2.sh ./force_disable_swap_v2.sh |

创建deploy_kubectl.sh脚本,根据节点角色自动分发对应的配置:

|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash #!/bin/bash # ============================================ # 颜色定义 # ============================================ RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' BLUE='\033[0;34m' NC='\033[0m' # ============================================ # 变量定义 # ============================================ KUBECTL_BIN="/usr/bin/kubectl" ADMIN_KUBECONFIG="./admin.kubeconfig" OPS_KUBECONFIG="./cluster-ops.kubeconfig" ALL_NODES="./all_nodes.txt" SSH_USER="jack" echo "=====================================" echo -e "{BLUE}部署 kubectl 和 kubeconfig{NC}" echo "=====================================" echo "" # 检查文件 for file in {KUBECTL_BIN} {ADMIN_KUBECONFIG} {OPS_KUBECONFIG} {ALL_NODES}; do if [ ! -f {file} \]; then echo -e "{RED}错误:文件 {file} 不存在!{NC}" exit 1 fi echo -e "{GREEN}✓{NC} {file}" done echo "" while read -r ip hostname role; do \[\[ -z "ip" || "ip" =\~ \^# \]\] \&\& continue \[ "{hostname}" == "ControlNodeA" ] && continue # 本地已有 echo "=====================================" echo -e "{YELLOW}处理节点: {hostname} ({ip}) 角色: {role}{NC}" echo "=====================================" # ============================================ # 工作节点:不安装任何东西 # ============================================ if \[ "{role}" == "worker" ]; then echo -e "{YELLOW}跳过(工作节点,不安装集群管控工具){NC}" echo "" continue fi # ============================================ # control / ops 节点:安装 kubectl # ============================================ echo -e "{BLUE}\[1/3\] 安装 kubectl...{NC}" if ! scp {KUBECTL_BIN} {SSH_USER}@{ip}:/home/{SSH_USER}/kubectl 2>/dev/null; then echo -e "{RED}✗ kubectl 复制失败{NC}" continue fi ssh -n {SSH_USER}@{ip} "sudo mv /home/{SSH_USER}/kubectl /usr/local/bin/kubectl \&\& sudo chmod +x /usr/local/bin/kubectl" echo -e "{GREEN}✓ kubectl 安装完成{NC}" # ============================================ # 创建 .kube 目录 # ============================================ echo -e "{BLUE}[2/3] 分发 kubeconfig...{NC}" ssh -n {SSH_USER}@{ip} "mkdir -p /home/{SSH_USER}/.kube && chmod 700 /home/{SSH_USER}/.kube" # control 节点 → admin.kubeconfig,ops 节点 → cluster-ops.kubeconfig if \[ "{role}" == "control" ]; then scp {ADMIN_KUBECONFIG} {SSH_USER}@{ip}:/home/{SSH_USER}/.kube/config 2>/dev/null echo -e " → 分发 {GREEN}管理员{NC} kubeconfig" elif [ "{role}" == "ops" \]; then scp {OPS_KUBECONFIG} {SSH_USER}@{ip}:/home/{SSH_USER}/.kube/config 2\>/dev/null echo -e " → 分发 {YELLOW}运维{NC} kubeconfig" fi # ============================================ # 设置权限和自动补全 # ============================================ echo -e "{BLUE}[3/3] 配置环境...{NC}" ssh -n {SSH_USER}@{ip} bash \<\< 'EOF' chmod 600 /home/jack/.kube/config kubectl completion bash \> /home/jack/.kube/completion.bash if ! grep -q 'kubectl completion' /home/jack/.bashrc; then echo 'source /home/jack/.kube/completion.bash' \>\> /home/jack/.bashrc fi echo "✓ 配置完成" EOF echo -e "{GREEN}✓ {hostname} 部署完成{NC}" echo "" done < {ALL_NODES} echo "=====================================" echo -e "{GREEN}部署完成!${NC}" echo "=====================================" echo "" echo "权限分配总结:" echo " 控制平面节点 (ControlNodeB, ControlNodeC) → admin.kubeconfig(集群管理员)" echo " 运维跳板机 (DevOpsToolNode) → cluster-ops.kubeconfig(运维权限)" echo " 工作节点 (其余5个) → 未安装 kubectl" echo "" echo "验证命令:" echo " ./verify_kubectl.sh" echo "=====================================" |

4.3 执行批量分发

|--------------------------------------------------------------------------|
| bash # 给脚本添加执行权限 chmod +x deploy_kubectl.sh # 执行批量分发 ./deploy_kubectl.sh |

创建verify_kubectl.sh(验证脚本)

|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash #!/bin/bash echo "=====================================" echo "kubectl 部署验证" echo "=====================================" printf "%-20s %-10s %-15s %-25s\n" "主机名" "角色" "kubectl" "kubeconfig" echo "-------------------------------------------------------------------------" while read -r ip hostname role; do [[ -z "ip" \|\| "ip" =~ ^# ]] && continue # 本地 ControlNodeA if [ "{hostname}" == "ControlNodeA" \]; then if command -v kubectl \&\>/dev/null; then ver=(kubectl version --client -o json 2>/dev/null | grep gitVersion | awk -F'"' '{print 4}') \[ -z "ver" ] && ver="已安装" else ver="未安装" fi printf "%-20s %-10s %-15s %-25s\n" "{hostname}" "control" "{ver}" "admin" continue fi # 工作节点 if [ "{role}" == "worker" \]; then printf "%-20s %-10s %-15s %-25s\\n" "{hostname}" "worker" "---" "---" continue fi # control / ops 节点 result=(ssh -n -o ConnectTimeout=5 -o BatchMode=yes jack@{ip} ' if command -v kubectl >/dev/null 2>&1; then ver=(kubectl version --client -o json 2\>/dev/null \| grep gitVersion \| awk -F"\\"" "{print \\4}") [ -n "ver" \] \&\& echo "kubectl:ver" || echo "kubectl:已安装" else echo "kubectl:未安装" fi if [ -f /home/jack/.kube/config ]; then kubectl get nodes >/dev/null 2>&1 && echo "config:admin" || echo "config:已配置" else echo "config:无" fi ' 2>/dev/null) if [ -z "result" \]; then printf "%-20s %-10s %-15s %-25s\\n" "{hostname}" "{role}" "连接失败" "连接失败" continue fi kubectl_ver=(echo "result" \| grep "kubectl:" \| cut -d: -f2-) config_type=(echo "result" \| grep "config:" \| cut -d: -f2-) printf "%-20s %-10s %-15s %-25s\\n" "{hostname}" "{role}" "{kubectl_ver:-未知}" "${config_type:-未知}" done < all_nodes.txt echo "" echo "=====================================" echo "验证完成!" echo "=====================================" # 给脚本添加执行权限 chmod +x verify_kubectl.sh # 执行验证脚本 ./verify_kubectl.sh |

五、最终全节点状态验收

执行以下命令,一次性完成所有 9 个节点的最终状态验收,确保所有配置符合要求:

|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bash #!/bin/bash echo "=====================================" echo "全节点最终状态验收报告" echo "=====================================" echo "" printf "%-20s %-15s %-10s %-12s %-12s %-10s\n" "主机名" "IP地址" "角色" "Swap" "containerd" "kubectl" echo "--------------------------------------------------------------------------------" while read -r ip hostname role; do [[ -z "ip" \|\| "ip" =~ ^# ]] && continue if [ "{hostname}" == "ControlNodeA" \]; then swap_kb=(free | grep Swap | awk '{print 2}') \[ "swap_kb" -eq 0 ] && swap_status="已关闭" || swap_status="swap_kb KB" containerd_status=(systemctl is-active containerd 2>/dev/null || echo "未运行") kubectl_status=(command -v kubectl \&\>/dev/null \&\& kubectl get nodes \&\>/dev/null \&\& echo "正常" \|\| echo "未安装") else # 每条命令单独 SSH(简单可靠) swap_kb=(ssh -n -o ConnectTimeout=5 jack@{ip} "free \| grep Swap \| awk '{print \\2}'" 2>/dev/null) containerd_val=(ssh -n -o ConnectTimeout=5 jack@{ip} "sudo systemctl is-active containerd 2>/dev/null" 2>/dev/null) kubectl_val=(ssh -n -o ConnectTimeout=5 jack@{ip} "command -v kubectl >/dev/null 2>&1 && kubectl get nodes >/dev/null 2>&1 && echo '正常' || echo '未安装'" 2>/dev/null) if [ -n "swap_kb" \]; then \[ "swap_kb" -eq 0 ] && swap_status="已关闭" || swap_status="swap_kb KB" else swap_status="连接失败" fi containerd_status="{containerd_val:-连接失败}" kubectl_status="{kubectl_val:-连接失败}" fi printf "%-20s %-15s %-10s %-12s %-12s %-10s\\n" \\ "{hostname}" "{ip}" "{role}" "{swap_status}" "{containerd_status}" "${kubectl_status}" done < all_nodes.txt echo "" echo "================================================================================" echo "验收完成!" echo "================================================================================" # 给脚本添加执行权限 chmod +x final_check.sh # 执行批量分发 ./final_check.sh |

六、常见问题快速排查

|-------------------|-------------------------|-------------------------------------------------------|----------------------------------------|
| 问题现象 | 可能原因 | 解决方法 | |
| Swap 显示仍有空间 | /etc/fstab 存在多个 Swap 条目 | sudo sed -i '/swap/d' /etc/fstab && sudo swapoff -a | |
| containerd 服务启动失败 | 配置文件格式错误 | 重新生成默认配置:`sudo containerd config default | sudo tee /etc/containerd/config.toml` |
| crictl 连接失败 | socket 文件权限问题 | sudo chmod 666 /run/containerd/containerd.sock | |
| kubectl 连接集群超时 | 防火墙阻止 6443 端口 | 在 ControlNodeA 执行:sudo ufw allow 6443/tcp | |
| kubeconfig 权限报错 | 文件权限过大 | chmod 600 /root/.kube/config | |

总结

本文基于已完成的 SSH 免密环境,通过自动化脚本一次性完成了8 个节点的系统初始化、containerd 统一安装和 kubeconfig 批量分发。目前所有 9 个节点均已满足 Kubernetes 集群加入的全部前置条件,为下一步扩展为 3 控制平面 + 6 工作节点的高可用集群做好了充分准备。

本文为"搭建DevOps企业级仿真实验环境"系列的一部分,所有内容均基于实际硬件环境(32核64线程 / 128G内存 / 6T硬盘)编写,力求贴近真实企业部署场景。

欢迎各位 DevOps、SRE 爱好者,在评论区留言交流探讨,互相学习。

相关推荐
云达闲人14 天前
搭建DevOps企业级仿真实验环境:010Kubernetes 单节点集群完整搭建指南
云原生·kubernetes·devops·devops 实验环境·k8s 集群·flannel 网络插件·kubernetes集群搭建
云达闲人15 天前
搭建DevOps企业级仿真实验环境:011Kubernetes 核心架构与组件
运维·kubernetes·devops·k8s 核心架构·k8s 组件解析·devops 实验环境·proxmox 虚拟化
achi0103 年前
containerd 入门
containerd·containerd 部署·yum部署containerd·containerd 常用命令·containerd 安装部署·containerd 安装·部署 containerd