二十三、K8s企业级架构设计及落地
文章目录
- 二十三、K8s企业级架构设计及落地
-
- 1、K8s企业级架构设计
-
- [1.1 Kubernetes集群架构--高可用](#1.1 Kubernetes集群架构--高可用)
- [1.2 K8s生产环境资源配置](#1.2 K8s生产环境资源配置)
- [1.3 K8s生产环境磁盘划分](#1.3 K8s生产环境磁盘划分)
- [1.4 k8s集群网段划分](#1.4 k8s集群网段划分)
- 2、基本环境配置
-
- [2.1 集群规划(学习测试环境)](#2.1 集群规划(学习测试环境))
- [2.2 磁盘挂载](#2.2 磁盘挂载)
- [2.3 基本环境配置](#2.3 基本环境配置)
-
- [2.3.1 配置hosts **(所有节点)**](#2.3.1 配置hosts (所有节点))
- [2.3.2 配置阿里云镜像源 **(所有节点)**](#2.3.2 配置阿里云镜像源 (所有节点))
- [2.3.3 关闭防火墙、selinux、dnsmasq、swap、开启rsyslog。 **(所有节点)**](#2.3.3 关闭防火墙、selinux、dnsmasq、swap、开启rsyslog。 (所有节点))
- [2.3.4 时间同步 **(所有节点)**](#2.3.4 时间同步 (所有节点))
- [2.3.5 配置limit **(所有节点)**](#2.3.5 配置limit (所有节点))
- [2.3.6 升级系统 **(所有节点)**](#2.3.6 升级系统 (所有节点))
- [2.3.7 配置免密钥登录 **(Master01节点)**](#2.3.7 配置免密钥登录 (Master01节点))
- [2.4 内核配置 **(所有节点)**](#2.4 内核配置 (所有节点))
-
- [2.4.1 安装ipvsadm](#2.4.1 安装ipvsadm)
- [2.4.2 配置ipvs模块](#2.4.2 配置ipvs模块)
- [2.4.3 内核优化配置](#2.4.3 内核优化配置)
- [2.4.4 重启](#2.4.4 重启)
- [3、高可用组件安装 **(Master节点)**](#3、高可用组件安装 (Master节点))
-
- [3.1 安装HAProxy和KeepAlived](#3.1 安装HAProxy和KeepAlived)
- [3.2 配置HAProxy](#3.2 配置HAProxy)
- [3.3 配置KeepAlived](#3.3 配置KeepAlived)
- [3.4 配置KeepAlived健康检查文件](#3.4 配置KeepAlived健康检查文件)
- [3.5 测试keepalived的VIP是否是正常的](#3.5 测试keepalived的VIP是否是正常的)
- [4、Runtime安装 **(所有节点)**](#4、Runtime安装 (所有节点))
-
- [4.1 配置安装源](#4.1 配置安装源)
- [4.2 安装docker-ce](#4.2 安装docker-ce)
- [4.3 配置Containerd所需的模块](#4.3 配置Containerd所需的模块)
- [4.4 加载模块:](#4.4 加载模块:)
- [4.5 配置Containerd所需的内核:](#4.5 配置Containerd所需的内核:)
- [4.6 生成Containerd的配置文件:](#4.6 生成Containerd的配置文件:)
- [5、安装Kubernetes组件 **(所有节点)**](#5、安装Kubernetes组件 (所有节点))
- 6、集群初始化
-
- [6.1 创建kubeadm文件 **(Master01节点)**](#6.1 创建kubeadm文件 (Master01节点))
- [6.2 将new.yaml文件复制到其他master节点 **(Master01节点)**](#6.2 将new.yaml文件复制到其他master节点 (Master01节点))
- [6.3 提前下载镜像 **(所有Master节点)**(其他节点不需要更改任何配置,包括IP地址也不需要更改)](#6.3 提前下载镜像 (所有Master节点)(其他节点不需要更改任何配置,包括IP地址也不需要更改))
- [6.4 初始化集群 **(Master01节点)**](#6.4 初始化集群 (Master01节点))
- [6.5 配置环境变量,用于访问Kubernetes集群 **(Master01节点)**](#6.5 配置环境变量,用于访问Kubernetes集群 (Master01节点))
- [6.6 如果初始化失败,使用如下命令重置后再次初始化,命令如下(没有失败不要执行):](#6.6 如果初始化失败,使用如下命令重置后再次初始化,命令如下(没有失败不要执行):)
- [6.7 高可用Master **(在master02和master03分别执行join命令)**](#6.7 高可用Master (在master02和master03分别执行join命令))
- [6.8 工作节点的配置 **(在node01和node02分别执行join命令)**](#6.8 工作节点的配置 (在node01和node02分别执行join命令))
- 7、Calico组件的安装
-
- [7.1 禁止NetworkManager管理Calico的网络接口,防止有冲突或干扰 **(所有节点)**](#7.1 禁止NetworkManager管理Calico的网络接口,防止有冲突或干扰 (所有节点))
- [7.2 安装Calico **(只在master01执行,.x不需要更改)**](#7.2 安装Calico (只在master01执行,.x不需要更改))
- [8、Metrics部署 **(master01节点)**](#8、Metrics部署 (master01节点))
- [9、Dashboard部署 **(master01节点)**](#9、Dashboard部署 (master01节点))
-
- [9.1 安装](#9.1 安装)
- [9.2 登录dashboard](#9.2 登录dashboard)
- [10、一些必须的配置更改 **(master01节点)**](#10、一些必须的配置更改 (master01节点))
- 11、k8s集群维护管理
-
- [11.1 节点下线](#11.1 节点下线)
-
- [11.1.1 下线步骤](#11.1.1 下线步骤)
- [11.1.2 执行下线](#11.1.2 执行下线)
- [11.2 添加节点](#11.2 添加节点)
-
- [11.2.1 基本环境配置,新增节点更改主机名**(具体查看第2.3章节)**:](#11.2.1 基本环境配置,新增节点更改主机名**(具体查看第2.3章节)**:)
- [11.2.2 内核配置**(具体查看第2.4章节)**](#11.2.2 内核配置**(具体查看第2.4章节)**)
- [11.2.3 安装Containerd **(具体查看第4章节)**](#11.2.3 安装Containerd (具体查看第4章节))
- [11.2.4 安装Kubernetes组件**(具体查看第5章节)**](#11.2.4 安装Kubernetes组件**(具体查看第5章节)**)
- [11.2.5 新增节点配置源(注意更改版本号):](#11.2.5 新增节点配置源(注意更改版本号):)
- [11.3 集群升级](#11.3 集群升级)
-
- [11.3.1 升级流程及注意事项](#11.3.1 升级流程及注意事项)
- [11.3.2 升级主节点:](#11.3.2 升级主节点:)
- [11.3.3 升级其他主节点](#11.3.3 升级其他主节点)
- [11.3.4 升级工作节点](#11.3.4 升级工作节点)
1、K8s企业级架构设计
1.1 Kubernetes集群架构--高可用

1.2 K8s生产环境资源配置
工作节点数量 | 工作节点最低数量 | 工作节点配置 | 控制节点数量 | 控制节点配置 | Etcd节点配置 | Master&Etcd |
---|---|---|---|---|---|---|
0-100 | 3 | 8C32G/16C64G | 3 | / | / | 8C32G+128G SSD |
100-250 | 3 | 8C32G/16C64G | 3 | / | / | 16C32G+256G SSD |
250-500 | 3 | 8C32G/16C64G | 3 | 16C32G+ | 8C32G+512G SSD*5 | / |
1.3 K8s生产环境磁盘划分
节点 | 根分区(100G) | Etcd数据盘(100G NVME SSD) | 数据盘(500G SSD) |
---|---|---|---|
控制节点 | / | /var/lib/etcd | /data /var/lib/kubelet /var/lib/containers |
工作节点 | / | - | /data /var/lib/kubelet /var/lib/containers |
1.4 k8s集群网段划分
- 节点网段:192.168.181.0/24
- Service网段:10.96.0.0/16
- Pod网段:172.16.0.0/16
- Service保留IP:
- CoreDNS Service IP:10.96.0.10
- APIServer Service IP:10.96.0.1
2、基本环境配置
2.1 集群规划(学习测试环境)
主机名称 | 物理IP | 系统 | 资源配置 | 系统盘 | Etcd | 数据盘(生产可做成逻辑卷的形式) |
---|---|---|---|---|---|---|
k8s-master01 | 192.168.200.61 | Rocky9.4 | 4C4G | 40G | 20G | 40G |
k8s-master02 | 192.168.200.62 | Rocky9.4 | 4C4G | 40G | 20G | 40G |
k8s-master03 | 192.168.200.63 | Rocky9.4 | 4C4G | 40G | 20G | 40G |
k8s-node01 | 192.168.200.64 | Rocky9.4 | 4C4G | 40G | / | 40G |
k8s-node02 | 192.168.200.65 | Rocky9.4 | 4C4G | 40G | / | 40G |
VIP | 192.168.200.100 | / | / | / | / | / |
2.2 磁盘挂载
[root@k8s-master01 ~]# fdisk -l|grep "Disk /dev/nvme0n"
Disk /dev/nvme0n1: 40 GiB, 42949672960 bytes, 83886080 sectors
Disk /dev/nvme0n2: 20 GiB, 21474836480 bytes, 41943040 sectors
Disk /dev/nvme0n3: 40 GiB, 42949672960 bytes, 83886080 sectors
# 创建etcd目录/数据目录
[root@k8s-master01 ~]# mkdir -p /var/lib/etcd /data
# 创建分区
[root@k8s-master01 ~]# fdisk /dev/nvme0n2
[root@k8s-master01 ~]# fdisk /dev/nvme0n3
# 格式化磁盘
[root@k8s-master01 ~]# mkfs.xfs /dev/nvme0n2p1
[root@k8s-master01 ~]# mkfs.xfs /dev/nvme0n3p1
# 查看uid
[root@k8s-master01 ~]# blkid /dev/nvme0n2p1
/dev/nvme0n2p1: UUID="fe42cf86-59e1-4f02-9612-9942536f23ca" TYPE="xfs" PARTUUID="71ff24c2-01"
[root@k8s-master01 ~]# blkid /dev/nvme0n3p1
/dev/nvme0n3p1: UUID="f1cbe99b-71f8-48d9-822a-62fb6b608c38" TYPE="xfs" PARTUUID="9ab6883d-01"
# 设置为开机自动挂载
[root@k8s-master01 ~]# vim /etc/fstab
[root@k8s-master01 ~]# tail -2 /etc/fstab
UUID="fe42cf86-59e1-4f02-9612-9942536f23ca" /var/lib/etcd xfs defaults 0 0
UUID="f1cbe99b-71f8-48d9-822a-62fb6b608c38" /data xfs defaults 0 0
# 挂载磁盘
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# mount -a
[root@k8s-master01 ~]# df -hT | grep /dev/nvme0n
/dev/nvme0n1p1 xfs 960M 330M 631M 35% /boot
/dev/nvme0n2p1 xfs 20G 175M 20G 1% /var/lib/etcd
/dev/nvme0n3p1 xfs 40G 318M 40G 1% /data
[root@k8s-master01 ~]# mkdir -p /data/kubelet /data/containers
#设置软连接
[root@k8s-master01 ~]# ln -s /data/kubelet/ /var/lib/
[root@k8s-master01 ~]# ln -s /data/containers/ /var/lib/
2.3 基本环境配置
2.3.1 配置hosts (所有节点)
[root@k8s-master01 ~]# vim /etc/hosts
[root@k8s-master01 ~]# tail -5 /etc/hosts
192.168.200.61 k8s-master01
192.168.200.62 k8s-master02
192.168.200.63 k8s-master03
192.168.200.64 k8s-node01
192.168.200.65 k8s-node02
2.3.2 配置阿里云镜像源 (所有节点)
[root@k8s-master01 ~]# sed -e 's|^mirrorlist=|#mirrorlist=|g' -e 's|^#baseurl=http://dl.rockylinux.org/$contentdir|baseurl=https://mirrors.aliyun.com/rockylinux|g' -i.bak /etc/yum.repos.d/*.repo
[root@k8s-master01 ~]# dnf makecache
2.3.3 关闭防火墙、selinux、dnsmasq、swap、开启rsyslog。 (所有节点)
# 关闭防火墙
[root@k8s-master01 ~]# systemctl disable --now firewalld
[root@k8s-master01 ~]# systemctl disable --now dnsmasq
# 关闭selinux
[root@k8s-master01 ~]# setenforce 0
[root@k8s-master01 ~]# sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/sysconfig/selinux
[root@k8s-master01 ~]# sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
# 关闭swap分区
[root@k8s-master01 ~]# swapoff -a && sysctl -w vm.swappiness=0
[root@k8s-master01 ~]# sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab
2.3.4 时间同步 (所有节点)
# 安装ntpdate
[root@k8s-master01 ~]# dnf install epel-release -y
[root@k8s-master01 ~]# dnf config-manager --set-enabled epel
[root@k8s-master01 ~]# dnf install ntpsec -y
# 同步时间并配置上海时区
[root@k8s-master01 ~]# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
[root@k8s-master01 ~]# echo 'Asia/Shanghai' >/etc/timezone
[root@k8s-master01 ~]# ntpdate time2.aliyun.com
# 加入到crontab
[root@k8s-master01 ~]# crontab -e
[root@k8s-master01 ~]# crontab -l
*/5 * * * * /usr/sbin/ntpdate time2.aliyun.com
2.3.5 配置limit (所有节点)
[root@k8s-master01 ~]# ulimit -SHn 65535
[root@k8s-master01 ~]# vim /etc/security/limits.conf
[root@k8s-master01 ~]# tail -6 /etc/security/limits.conf
* soft nofile 65536
* hard nofile 131072
* soft nproc 65535
* hard nproc 655350
* soft memlock unlimited
* hard memlock unlimited
2.3.6 升级系统 (所有节点)
[root@k8s-master01 ~]# yum update -y
2.3.7 配置免密钥登录 (Master01节点)
# 生成密匙
[root@k8s-master01 ~]# ssh-keygen -t rsa
# 分发密匙
[root@k8s-master01 ~]# for i in k8s-master01 k8s-master02 k8s-master03 k8s-node01 k8s-node02;do ssh-copy-id -i .ssh/id_rsa.pub $i;done
2.4 内核配置 (所有节点)
2.4.1 安装ipvsadm
[root@k8s-master01 ~]# yum install ipvsadm ipset sysstat conntrack libseccomp -y
2.4.2 配置ipvs模块
# 配置ipvs模块:
[root@k8s-master01 ~]# modprobe -- ip_vs
[root@k8s-master01 ~]# modprobe -- ip_vs_rr
[root@k8s-master01 ~]# modprobe -- ip_vs_wrr
[root@k8s-master01 ~]# modprobe -- ip_vs_sh
[root@k8s-master01 ~]# modprobe -- nf_conntrack
# 创建ipvs.conf,并配置开机自动加载
[root@k8s-master01 ~]# vim /etc/modules-load.d/ipvs.conf
[root@k8s-master01 ~]# cat /etc/modules-load.d/ipvs.conf
ip_vs
ip_vs_lc
ip_vs_wlc
ip_vs_rr
ip_vs_wrr
ip_vs_lblc
ip_vs_lblcr
ip_vs_dh
ip_vs_sh
ip_vs_fo
ip_vs_nq
ip_vs_sed
ip_vs_ftp
ip_vs_sh
nf_conntrack
ip_tables
ip_set
xt_set
ipt_set
ipt_rpfilter
ipt_REJECT
ipip
# 报错可忽略
[root@k8s-master01 ~]# systemctl enable --now systemd-modules-load.service
2.4.3 内核优化配置
cat <<EOF > /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
fs.may_detach_mounts = 1
net.ipv4.conf.all.route_localnet = 1
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_watches=89100
fs.file-max=52706963
fs.nr_open=52706963
net.netfilter.nf_conntrack_max=2310720
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl =15
net.ipv4.tcp_max_tw_buckets = 36000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 327680
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.ip_conntrack_max = 65536
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_timestamps = 0
net.core.somaxconn = 16384
EOF
# 应用内核配置
[root@k8s-master01 ~]# sysctl --system
2.4.4 重启
[root@k8s-master01 ~]# reboot
# 查看内核模块是否已自动加载
[root@k8s-master01 ~]# lsmod | grep --color=auto -e ip_vs -e nf_conntrack
ip_vs_ftp 12288 0
nf_nat 65536 1 ip_vs_ftp
ip_vs_sed 12288 0
ip_vs_nq 12288 0
ip_vs_fo 12288 0
ip_vs_sh 12288 0
ip_vs_dh 12288 0
ip_vs_lblcr 12288 0
ip_vs_lblc 12288 0
ip_vs_wrr 12288 0
ip_vs_rr 12288 0
ip_vs_wlc 12288 0
ip_vs_lc 12288 0
ip_vs 237568 25 ip_vs_wlc,ip_vs_rr,ip_vs_dh,ip_vs_lblcr,ip_vs_sh,ip_vs_fo,ip_vs_nq,ip_vs_lblc,ip_vs_wrr,ip_vs_lc,ip_vs_sed,ip_vs_ftp
nf_conntrack 229376 2 nf_nat,ip_vs
nf_defrag_ipv6 24576 2 nf_conntrack,ip_vs
nf_defrag_ipv4 12288 1 nf_conntrack
libcrc32c 12288 4 nf_conntrack,nf_nat,xfs,ip_vs
3、高可用组件安装 (Master节点)
公有云要用公有云自带的负载均衡,比如阿里云的SLB、NLB,腾讯云的ELB,用来替代haproxy和keepalived,因为公有云大部分都是不支持keepalived的。
3.1 安装HAProxy和KeepAlived
[root@k8s-master01 ~]# yum install keepalived haproxy -y
3.2 配置HAProxy
[root@k8s-master01 ~]# vim /etc/haproxy/haproxy.cfg
[root@k8s-master01 ~]# cat /etc/haproxy/haproxy.cfg
global
maxconn 2000
ulimit-n 16384
log 127.0.0.1 local0 err
stats timeout 30s
defaults
log global
mode http
option httplog
timeout connect 5000
timeout client 50000
timeout server 50000
timeout http-request 15s
timeout http-keep-alive 15s
frontend monitor-in
bind *:33305
mode http
option httplog
monitor-uri /monitor
frontend k8s-master
bind 0.0.0.0:16443
bind 127.0.0.1:16443
mode tcp
option tcplog
tcp-request inspect-delay 5s
default_backend k8s-master
backend k8s-master
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server k8s-master01 192.168.200.61:6443 check
server k8s-master02 192.168.200.62:6443 check
server k8s-master03 192.168.200.63:6443 check
3.3 配置KeepAlived
1、k8s-master01节点
[root@k8s-master01 ~]# vim /etc/keepalived/keepalived.conf
[root@k8s-master01 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
script_user root
enable_script_security
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 5
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state MASTER
interface ens160
mcast_src_ip 192.168.200.61
virtual_router_id 51
priority 101
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
192.168.200.100
}
track_script {
chk_apiserver
}
}
2、k8s-master02节点
[root@k8s-master02 ~]# vim /etc/keepalived/keepalived.conf
[root@k8s-master02 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
script_user root
enable_script_security
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 5
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state BACKUP
interface ens160
mcast_src_ip 192.168.200.62
virtual_router_id 51
priority 100
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
192.168.200.100
}
track_script {
chk_apiserver
}
}
3、k8s-master03节点
[root@k8s-master03 ~]# vim /etc/keepalived/keepalived.conf
[root@k8s-master03 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
script_user root
enable_script_security
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 5
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state BACKUP
interface ens160
mcast_src_ip 192.168.200.63
virtual_router_id 51
priority 100
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
192.168.200.100
}
track_script {
chk_apiserver
}
}
3.4 配置KeepAlived健康检查文件
# 配置脚本
[root@k8s-master01 ~]# vim /etc/keepalived/check_apiserver.sh
[root@k8s-master01 ~]# cat /etc/keepalived/check_apiserver.sh
#!/bin/bash
err=0
for k in $(seq 1 3)
do
check_code=$(pgrep haproxy)
if [[ $check_code == "" ]]; then
err=$(expr $err + 1)
sleep 1
continue
else
err=0
break
fi
done
if [[ $err != "0" ]]; then
echo "systemctl stop keepalived"
/usr/bin/systemctl stop keepalived
exit 1
else
exit 0
fi
# 添加权限:
[root@k8s-master01 ~]# chmod +x /etc/keepalived/check_apiserver.sh
# 所有master节点启动haproxy和keepalived:
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl enable --now haproxy
[root@k8s-master01 ~]# systemctl enable --now keepalived
3.5 测试keepalived的VIP是否是正常的
[root@k8s-master01 ~]# ping -c2 192.168.200.100
PING 192.168.200.100 (192.168.200.100) 56(84) bytes of data.
64 bytes from 192.168.200.100: icmp_seq=1 ttl=64 time=0.200 ms
64 bytes from 192.168.200.100: icmp_seq=2 ttl=64 time=0.072 ms
--- 192.168.200.100 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1042ms
rtt min/avg/max/mdev = 0.072/0.136/0.200/0.064 ms
[root@k8s-master01 ~]# echo|telnet 192.168.200.100 16443
Trying 192.168.200.100...
Connected to 192.168.200.100.
Escape character is '^]'.
Connection closed by foreign host.
- 如果ping不通且telnet不通,排查步骤:
- 确认VIP是否正确
- 所有节点查看防火墙状态必须为
disable
和inactive
:systemctl status firewalld
- 所有节点查看selinux状态,必须
为disable
:getenforce
- master节点查看haproxy和keepalived状态:
systemctl status keepalived haproxy
- master节点查看监听端口:
netstat -lntp
- 如果以上都没有问题,需要确认:
- 是否是公有云机器
- 是否是私有云机器(类似OpenStack)
上述公有云一般都是不支持keepalived,私有云可能也有限制,需要和自己的私有云管理员咨询
4、Runtime安装 (所有节点)
4.1 配置安装源
[root@k8s-master01 ~]# yum install wget jq psmisc vim net-tools telnet yum-utils device-mapper-persistent-data lvm2 git -y
[root@k8s-master01 ~]# yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
4.2 安装docker-ce
[root@k8s-master01 ~]# yum install docker-ce containerd -y
4.3 配置Containerd所需的模块
[root@k8s-master01 ~]# cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
4.4 加载模块:
[root@k8s-master01 ~]# modprobe -- overlay
[root@k8s-master01 ~]# modprobe -- br_netfilter
4.5 配置Containerd所需的内核:
[root@k8s-master01 ~]# cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
# 加载内核:
[root@k8s-master01 ~]# sysctl --system
4.6 生成Containerd的配置文件:
[root@k8s-master01 ~]# mkdir -p /etc/containerd
[root@k8s-master01 ~]# containerd config default | tee /etc/containerd/config.toml
更改Containerd的Cgroup和Pause镜像配置:
[root@k8s-master01 ~]# sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml
[root@k8s-master01 ~]# sed -i 's#k8s.gcr.io/pause#registry.cn-hangzhou.aliyuncs.com/google_containers/pause#g' /etc/containerd/config.toml
[root@k8s-master01 ~]# sed -i 's#registry.gcr.io/pause#registry.cn-hangzhou.aliyuncs.com/google_containers/pause#g' /etc/containerd/config.toml
[root@k8s-master01 ~]# sed -i 's#registry.k8s.io/pause#registry.cn-hangzhou.aliyuncs.com/google_containers/pause#g' /etc/containerd/config.toml
启动Containerd,并配置开机自启动:
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl enable --now containerd
配置crictl客户端连接的运行时位置(可选):
[root@k8s-master01 ~]# cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF
5、安装Kubernetes组件 (所有节点)
[root@k8s-master01 ~]# cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.33/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.33/rpm/repodata/repomd.xml.key
EOF
所有节点安装1.33最新版本kubeadm、kubelet和kubectl:
[root@k8s-master01 ~]# yum install kubeadm-1.33* kubelet-1.33* kubectl-1.33* -y
所有节点设置Kubelet开机自启动(由于还未初始化,没有kubelet的配置文件,此时kubelet无法启动,无需关心):
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl enable --now kubelet
6、集群初始化
6.1 创建kubeadm文件 (Master01节点)
[root@k8s-master01 ~]# vim kubeadm-config.yaml
[root@k8s-master01 ~]# cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta4
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: 7t2weq.bjbawausm0jaxury
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.200.61
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
imagePullSerial: true
name: k8s-master01
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
timeouts:
controlPlaneComponentHealthCheck: 4m0s
discovery: 5m0s
etcdAPICall: 2m0s
kubeletHealthCheck: 4m0s
kubernetesAPICall: 1m0s
tlsBootstrap: 5m0s
upgradeManifests: 5m0s
---
apiServer:
certSANs:
- 192.168.200.100
apiVersion: kubeadm.k8s.io/v1beta4
caCertificateValidityPeriod: 876000h0m0s
certificateValidityPeriod: 876000h0m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 192.168.200.100:16443
controllerManager: {}
dns: {}
encryptionAlgorithm: RSA-2048
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.33.5
networking:
dnsDomain: cluster.local
podSubnet: 172.16.0.0/16
serviceSubnet: 10.96.0.0/16
proxy: {}
scheduler: {}
# 更新kubeadm文件:
[root@k8s-master01 ~]# kubeadm config migrate --old-config kubeadm-config.yaml --new-config new.yaml
# 修改时间
[root@k8s-master01 ~]# vim new.yaml
[root@k8s-master01 ~]# sed -n "22,23p" new.yaml
timeouts:
controlPlaneComponentHealthCheck: 4m0s
6.2 将new.yaml文件复制到其他master节点 (Master01节点)
[root@k8s-master01 ~]# for i in k8s-master02 k8s-master03; do scp new.yaml $i:/root/; done
6.3 提前下载镜像 (所有Master节点)(其他节点不需要更改任何配置,包括IP地址也不需要更改)
[root@k8s-master01 ~]# kubeadm config images pull --config /root/new.yaml
6.4 初始化集群 (Master01节点)
初始化以后会在/etc/kubernetes
目录下生成对应的证书和配置文件,之后其他Master节点加入Master01即可
[root@k8s-master01 ~]# kubeadm init --config /root/new.yaml --upload-certs
...
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes running the following command on each as root:
kubeadm join 192.168.200.100:16443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d \
--control-plane --certificate-key 981bb3fde1edb1f6e961f78343a033a1aeaf1da98e4b28b7804e1a8ca159dd87
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.200.100:16443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d
6.5 配置环境变量,用于访问Kubernetes集群 (Master01节点)
[root@k8s-master01 ~]# cat <<EOF >> /root/.bashrc
export KUBECONFIG=/etc/kubernetes/admin.conf
EOF
[root@k8s-master01 ~]# source /root/.bashrc
# 显示NotReady不影响
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane 106s v1.33.5
如果其它节点(包括非K8s节点)也想使用kubectl操作集群,请把admin.conf复制到其它节点,然后操作即可
6.6 如果初始化失败,使用如下命令重置后再次初始化,命令如下(没有失败不要执行):
kubeadm reset -f; ipvsadm --clear; rm -rf ~/.kube
如果多次尝试都是初始化失败,需要看系统日志,CentOS/RockyLinux日志路径:/var/log/messages,Ubuntu系列日志路径:/var/log/syslog:
tail -f /var/log/messages | grep -v "not found"
- 经常出错的原因:
- Containerd的配置文件修改的不对,自行参考《安装containerd》小节核对
- new.yaml配置问题,比如非高可用集群忘记修改16443端口为6443
- new.yaml配置问题,三个网段有交叉,出现IP地址冲突
- VIP不通导致无法初始化成功,此时messages日志会有VIP超时的报错
6.7 高可用Master (在master02和master03分别执行join命令)
添加其它Master节点到集群中,只需要执行如下的join命令即可。
注意:千万不要在master01再次执行,不能直接复制文档当中的命令,而是你自己刚才在master01初始化之后产生的命令
[root@k8s-master02 ~]# kubeadm join 192.168.200.100:16443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d \
--control-plane --certificate-key 981bb3fde1edb1f6e961f78343a033a1aeaf1da98e4b28b7804e1a8ca159dd87
查看当前状态:(如果显示NotReady不影响)
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane 7m17s v1.33.5
k8s-master02 NotReady control-plane 78s v1.33.5
k8s-master03 NotReady control-plane 87s v1.33.5
6.8 工作节点的配置 (在node01和node02分别执行join命令)
工作节点上主要部署公司的一些业务应用,生产环境中不建议Master节点部署系统组件之外的其他Pod,测试环境可以允许Master节点部署Pod以节省系统资源。
[root@k8s-node01 ~]# kubeadm join 192.168.200.100:16443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d
所有节点初始化完成后,查看集群状态(NotReady不影响)
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane 10m v1.33.5
k8s-master02 NotReady control-plane 4m16s v1.33.5
k8s-master03 NotReady control-plane 4m25s v1.33.5
k8s-node01 NotReady <none> 71s v1.33.5
k8s-node02 NotReady <none> 57s v1.33.5
7、Calico组件的安装
7.1 禁止NetworkManager管理Calico的网络接口,防止有冲突或干扰 (所有节点)
[root@k8s-master01 ~]# cat >>/etc/NetworkManager/conf.d/calico.conf<<EOF
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:tunl*;interface-name:vxlan.calico;interface-name:vxlan-v6.calico;interface-name:wireguard.cali;interface-name:wg-v6.cali
EOF
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl restart NetworkManager
7.2 安装Calico (只在master01执行,.x不需要更改)
[root@k8s-master01 ~]# cd /root/;git clone https://gitee.com/dukuan/k8s-ha-install.git
[root@k8s-master01 ~]# cd /root/k8s-ha-install && git checkout manual-installation-v1.33.x && cd calico/
修改Pod网段:
[root@k8s-master01 calico]# POD_SUBNET=`cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep cluster-cidr= | awk -F= '{print $NF}'`
替换Calico配置文件并安装:
[root@k8s-master01 calico]# sed -i "s#POD_CIDR#${POD_SUBNET}#g" calico.yaml
[root@k8s-master01 calico]# kubectl apply -f calico.yaml
此时节点全部变为Ready状态:
[root@k8s-master01 calico]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 35m v1.33.5
k8s-master02 Ready control-plane 29m v1.33.5
k8s-master03 Ready control-plane 29m v1.33.5
k8s-node01 Ready <none> 26m v1.33.5
k8s-node02 Ready <none> 26m v1.33.5
查看容器和节点状态:
[root@k8s-master01 calico]# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-8678987965-4j5bp 1/1 Running 0 16m
calico-node-92bnb 1/1 Running 0 16m
calico-node-9gqpm 1/1 Running 0 16m
calico-node-gdz59 1/1 Running 0 16m
calico-node-hrfkr 1/1 Running 0 16m
calico-node-tdgh8 1/1 Running 0 16m
coredns-746c97786-gz7hp 1/1 Running 0 45m
coredns-746c97786-sf7mw 1/1 Running 0 45m
etcd-k8s-master01 1/1 Running 0 45m
etcd-k8s-master02 1/1 Running 0 39m
etcd-k8s-master03 1/1 Running 0 39m
....
8、Metrics部署 (master01节点)
在新版的Kubernetes中系统资源的采集均使用Metrics-server,可以通过Metrics采集节点和Pod的内存、磁盘、CPU和网络的使用率。
(将Master01节点的front-proxy-ca.crt复制到所有Node节点)
[root@k8s-master01 calico]# scp /etc/kubernetes/pki/front-proxy-ca.crt k8s-node01:/etc/kubernetes/pki/front-proxy-ca.crt
[root@k8s-master01 calico]# scp /etc/kubernetes/pki/front-proxy-ca.crt k8s-node02:/etc/kubernetes/pki/front-proxy-ca.crt
安装metrics server:
[root@k8s-master01 calico]# cd /root/k8s-ha-install/kubeadm-metrics-server
[root@k8s-master01 kubeadm-metrics-server]# kubectl create -f comp.yaml
查看状态:
[root@k8s-master01 kubeadm-metrics-server]# kubectl get po -n kube-system -l k8s-app=metrics-server
NAME READY STATUS RESTARTS AGE
metrics-server-7d9d8df576-zzq9j 1/1 Running 0 57s
等Pod变成1/1 Running后,查看节点和Pod资源使用率:
[root@k8s-master01 kubeadm-metrics-server]# kubectl top node
NAME CPU(cores) CPU(%) MEMORY(bytes) MEMORY(%)
k8s-master01 727m 18% 977Mi 27%
k8s-master02 636m 15% 945Mi 26%
k8s-master03 626m 15% 916Mi 26%
k8s-node01 300m 7% 550Mi 15%
k8s-node02 331m 8% 433Mi 12%
9、Dashboard部署 (master01节点)
9.1 安装
Dashboard用于展示集群中的各类资源,同时也可以通过Dashboard实时查看Pod的日志和在容器中执行一些命令等。接下来安装Dashboard:
[root@k8s-master01 kubeadm-metrics-server]# cd /root/k8s-ha-install/dashboard/
[root@k8s-master01 dashboard]# kubectl create -f .
9.2 登录dashboard
更改dashboard的svc为NodePort:
[root@k8s-master01 dashboard]# kubectl edit svc kubernetes-dashboard -n kubernetes-dashboard
查看端口号:
[root@k8s-master01 dashboard]# kubectl get svc -n kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.96.128.108 <none> 8000/TCP 9m40s
kubernetes-dashboard NodePort 10.96.119.44 <none> 443:32506/TCP 9m42s
根据自己的实例端口号,通过任意安装了kube-proxy的宿主机的IP+端口即可访问到dashboard:访问Dashboard:https://192.168.200.61:32506 (把IP地址和端口改成你自己的)选择登录方式为令牌(即token方式)

创建登录Token:
[root@k8s-master01 dashboard]# kubectl create token admin-user -n kube-system
eyJhbGciOiJSUzI1NiIsImtpZCI6IjFsVVlxQWhNZ2RWVlRXRWNLX2VjZmNJZlhUbDNMazM0bzR3bWNMcmhoNkEifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzYwMjczMTYxLCJpYXQiOjE3NjAyNjk1NjEsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiMmVmYjhjZWYtMzg2Ni00ZGJhLWEzM2MtNGY4OWE2Mjk2MGFmIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJhZG1pbi11c2VyIiwidWlkIjoiZGY2ZWIwNTMtYzA3Ni00MDFjLWE0N2MtNjI5MTZiNjNkOTgyIn19LCJuYmYiOjE3NjAyNjk1NjEsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTphZG1pbi11c2VyIn0.byzK70mYJaoCJL3sh1sxaVUi88Q24MPWkEe4PQ2yHIKRbuYPJh8PHkyXmmdRL6VVJd7k_927P5VJp_2e9ScMOcyqADSu44CkVwHGBI9C66hvJpTnAm4XwUmTrotc-5lDebTrbjLgUe7eD54CpIbng7FM0eg98lcyv4o-6Zto-cjMG_92s_oCC1W9DMVvPctd8_q3wZmY2v6hx8vFd95wbRrr4JxTtZlPoKAyisbUSATw2MUG8at82QpZzNoXIJGQf0DXEJxOxbU_DCJ6xemB8urgfHWT4L0tu1v35nL6_uaRXEKCfxQxrLzstfl_TwQzN09AE66-kNAv6VUUzDqP3Q
将token值输入到令牌后,单击登录即可访问Dashboard
10、一些必须的配置更改 (master01节点)
将Kube-proxy改为ipvs模式,因为在初始化集群的时候注释了ipvs配置,所以需要自行修改一下:
[root@k8s-master01 dashboard]# kubectl edit cm kube-proxy -n kube-system
mode: ipvs
更新Kube-Proxy的Pod:
[root@k8s-master01 dashboard]# kubectl patch daemonset kube-proxy -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}" -n kube-system
验证Kube-Proxy模式
[root@k8s-master01 dashboard]# curl 127.0.0.1:10249/proxyMode
ipvs
11、k8s集群维护管理
11.1 节点下线
11.1.1 下线步骤
如果某个节点需要下线,可以使用如下步骤平滑下线:
- 1、添加污点禁止调度
- 2、查询节点是否有重要服务
- 漂移重要服务至其它节点
- 3、确认是否是ingress入口
- 端口流量
- 4、使用drain设置为驱逐状态
- 5、再次检查节点上的其它服务
- 基础组件等
- 6、查看有无异常的Pod
- 有无Pending的Pod
- 非Running状态
- 7、使用delete删除节点
- 8、节点下线
- kubeadm reset -f
- systemctl disable --now kubelet
11.1.2 执行下线
假设k8s-node02
为需要下线的节点,首先给该节点添加污点,防止Pod再次调度到本节点:
[root@k8s-master01 ~]# kubectl taint node k8s-node02 offline=true:NoSchedule
查看是否有重要服务:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node02
kube-system calico-node-gdz59 1/1 Running 2 (61m ago) 6d17h 192.168.200.65 k8s-node02 <none> <none>
kube-system kube-proxy-d6pp4 1/1 Running 2 (61m ago) 6d16h 192.168.200.65 k8s-node02 <none> <none>
kube-system metrics-server-7d9d8df576-zzq9j 1/1 Running 6 (59m ago) 6d17h 172.16.58.199 k8s-node02 <none> <none>
kubernetes-dashboard dashboard-metrics-scraper-69b4796d9b-dmqd9 1/1 Running 2 (61m ago) 6d16h 172.16.58.201 k8s-node02 <none> <none>
kubernetes-dashboard kubernetes-dashboard-778584b9dd-kg2hc 1/1 Running 3 (61m ago) 6d16h 172.16.58.200 k8s-node02 <none> <none>
假设dashboard-metrics-scraper
、kubernetes-dashboard
、metrics-server
为重要服务,使用rollout重新调度该服务(如果副本多,也可以直接删除Pod,防止全部重建):
[root@k8s-master01 ~]# kubectl rollout restart deploy dashboard-metrics-scraper -n kubernetes-dashboard
[root@k8s-master01 ~]# kubectl rollout restart deploy kubernetes-dashboard -n kubernetes-dashboard
[root@k8s-master01 ~]# kubectl rollout restart deploy metrics-server -n kube-system
# 再次查看Pod:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node02
kube-system calico-node-gdz59 1/1 Running 2 (72m ago) 6d17h 192.168.200.65 k8s-node02 <none> <none>
kube-system kube-proxy-d6pp4 1/1 Running 2 (72m ago) 6d16h 192.168.200.65 k8s-node02 <none> <none>
其它检查按需执行:
# 接下来驱逐节点
[root@k8s-master01 ~]# kubectl drain k8s-node02 --ignore-daemonsets
# 查看是否有Pending的服务(返回为空说明Pod均无问题):
[root@k8s-master01 ~]# kubectl get po -A | grep -i pending
# 查看非Running的Pod(返回为空说明Pod均无问题):
[root@k8s-master01 ~]# kubectl get po -A | grep -Ev '1/1|2/2|3/3|NAMESPACE'
# 接下来删除节点即可:
[root@k8s-master01 ~]# kubectl delete node k8s-node02
# 节点下线,根据需要处理节点即可:
[root@k8s-node02 ~]# kubeadm reset -f
[root@k8s-node02 ~]# systemctl disable --now kubelet
# 查看当前节点:
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 6d18h v1.33.5
k8s-master02 Ready control-plane 6d18h v1.33.5
k8s-master03 Ready control-plane 6d18h v1.33.5
k8s-node01 Ready <none> 6d18h v1.33.5
11.2 添加节点
11.2.1 基本环境配置,新增节点更改主机名**(具体查看第2.3章节)**:
11.2.2 内核配置**(具体查看第2.4章节)**
11.2.3 安装Containerd (具体查看第4章节)
11.2.4 安装Kubernetes组件**(具体查看第5章节)**
11.2.5 新增节点配置源(注意更改版本号):
# Master节点拷贝front-proxy证书:
[root@k8s-node02 ~]# mkdir -p /etc/kubernetes/pki/
[root@k8s-master01 ~]# scp /etc/kubernetes/pki/front-proxy-ca.crt 192.168.200.65:/etc/kubernetes/pki/
# Master节点生成新的token:
[root@k8s-master01 ~]# kubeadm token create --print-join-command
# 新增节点执行join命令:
[root@k8s-node02 ~]# kubeadm join 192.168.200.100:16443 --token gtuckg.ahx37p3zq54jrgy3 --discovery-token-ca-cert-hash sha256:323b80f1fc4d058e265b1f1af904f5dea2b0931f9e82aae3e3879231f35a498d
# Master节点查看节点状态:
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 6d18h v1.33.5
k8s-master02 Ready control-plane 6d18h v1.33.5
k8s-master03 Ready control-plane 6d18h v1.33.5
k8s-node01 Ready <none> 6d18h v1.33.5
k8s-node02 Ready <none> 27s v1.33.5
# Master节点查看Pod状态是否正常:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node02
kube-system calico-node-lw4s5 1/1 Running 0 2m36s 192.168.200.65 k8s-node02 <none> <none>
kube-system kube-proxy-kd6sf 1/1 Running 0 2m36s 192.168.200.65 k8s-node02 <none> <none>
11.3 集群升级
11.3.1 升级流程及注意事项
官方文档:
- 升级流程:
- 升级Master节点
- 维护工作节点
- 升级工作节点
- 注意事项:
- kubeadm不可以跨版本升级
- 有条件先备份后升级
- 关闭swap
# 假设需要升级到1.34,需要先配置1.34的源(所有节点):
[root@k8s-master01 ~]# cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.34/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.34/rpm/repodata/repomd.xml.key
EOF
11.3.2 升级主节点:
# 升级主节点需要挨个升级,首先升级Master01节点:
[root@k8s-master01 ~]# yum install -y kubeadm-'1.34*' kubelet-'1.34*' kubectl-'1.34*' --disableexcludes=kubernetes
# 查看版本:
[root@k8s-master01 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"34", EmulationMajor:"", EmulationMinor:"", MinCompatibilityMajor:"", MinCompatibilityMinor:"", GitVersion:"v1.34.1", GitCommit:"93248f9ae092f571eb870b7664c534bfc7d00f03", GitTreeState:"clean", BuildDate:"2025-09-09T19:43:15Z", GoVersion:"go1.24.6", Compiler:"gc", Platform:"linux/amd64"}
[root@k8s-master01 ~]# kubectl version
Client Version: v1.34.1
Kustomize Version: v5.7.1
Server Version: v1.33.5
# 验证升级计划:
[root@k8s-master01 ~]# kubeadm upgrade plan
....
Upgrade to the latest stable version:
COMPONENT NODE CURRENT TARGET
kube-apiserver k8s-master01 v1.33.5 v1.34.1
kube-apiserver k8s-master02 v1.33.5 v1.34.1
kube-apiserver k8s-master03 v1.33.5 v1.34.1
kube-controller-manager k8s-master01 v1.33.5 v1.34.1
kube-controller-manager k8s-master02 v1.33.5 v1.34.1
kube-controller-manager k8s-master03 v1.33.5 v1.34.1
kube-scheduler k8s-master01 v1.33.5 v1.34.1
kube-scheduler k8s-master02 v1.33.5 v1.34.1
kube-scheduler k8s-master03 v1.33.5 v1.34.1
kube-proxy 1.33.5 v1.34.1
CoreDNS v1.12.0 v1.12.1
etcd k8s-master01 3.5.21-0 3.6.4-0
etcd k8s-master02 3.5.21-0 3.6.4-0
etcd k8s-master03 3.5.21-0 3.6.4-0
You can now apply the upgrade by executing the following command:
kubeadm upgrade apply v1.34.1
_____________________________________________________________________
# 执行升级:
[root@k8s-master01 ~]# kubeadm upgrade apply v1.34.1
# 重启kubelet:
[root@k8s-master01 ~]# systemctl daemon-reload
[root@k8s-master01 ~]# systemctl restart kubelet
# 确认版本:
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 6d19h v1.34.1
k8s-master02 Ready control-plane 6d18h v1.33.5
k8s-master03 Ready control-plane 6d18h v1.33.5
k8s-node01 Ready <none> 6d18h v1.33.5
k8s-node02 Ready <none> 30m v1.33.5
[root@k8s-master01 ~]# grep "image:" /etc/kubernetes/manifests/*.yaml
/etc/kubernetes/manifests/etcd.yaml: image: registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.6.4-0
/etc/kubernetes/manifests/kube-apiserver.yaml: image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.34.1
/etc/kubernetes/manifests/kube-controller-manager.yaml: image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.34.1
/etc/kubernetes/manifests/kube-scheduler.yaml: image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.34.1
11.3.3 升级其他主节点
# 接下来升级其他主节点,首先安装组件:
[root@k8s-master02 ~]# yum install -y kubeadm-'1.34*' kubelet-'1.34*' kubectl-'1.34*' --disableexcludes=kubernetes
# 升级其他主节点:
[root@k8s-master02 ~]# kubeadm upgrade node
# 重启kubelet:
[root@k8s-master02 ~]# systemctl daemon-reload
[root@k8s-master02 ~]# systemctl restart kubelet
# 查看状态:
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 6d19h v1.34.1
k8s-master02 Ready control-plane 6d19h v1.34.1
k8s-master03 Ready control-plane 6d19h v1.34.1
k8s-node01 Ready <none> 6d19h v1.33.5
k8s-node02 Ready <none> 40m v1.33.5
11.3.4 升级工作节点
升级工作节点比较简单,只需要安装kubelet,然后重启即可,但是需要注意提前把节点设置为维护状态(和下线步骤类似,测试环境可以直接重启无法设置为维护)。
假设k8s-node01
为需要下线的节点,首先给该节点添加污点,防止Pod再次调度到本节点:
[root@k8s-master01 ~]# kubectl taint node k8s-node01 upgrade=true:NoSchedule
查看是否有重要服务:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node01
kube-system calico-kube-controllers-8678987965-4j5bp 1/1 Running 2 (7m41s ago) 6d18h 172.16.85.200 k8s-node01 <none> <none>
kube-system calico-node-tdgh8 1/1 Running 1 (70m ago) 6d18h 192.168.200.64 k8s-node01 <none> <none>
kube-system kube-proxy-278gx 1/1 Running 0 4m17s 192.168.200.64 k8s-node01 <none> <none>
kube-system metrics-server-74767fc66c-lv5w7 1/1 Running 0 63m 172.16.85.208 k8s-node01 <none> <none>
kubernetes-dashboard dashboard-metrics-scraper-5b47ccc9c7-45lds 1/1 Running 0 64m 172.16.85.204 k8s-node01 <none> <none>
kubernetes-dashboard kubernetes-dashboard-65fd974fd6-gfgpq 1/1 Running 0 62m 172.16.85.209 k8s-node01 <none> <none>
假设dashboard-metrics-scraper
、kubernetes-dashboard
、metrics-server
为重要服务,使用rollout重新调度该服务(如果副本多,也可以直接删除Pod,防止全部重建):
[root@k8s-master01 ~]# kubectl rollout restart deploy dashboard-metrics-scraper -n kubernetes-dashboard
[root@k8s-master01 ~]# kubectl rollout restart deploy kubernetes-dashboard -n kubernetes-dashboard
[root@k8s-master01 ~]# kubectl rollout restart deploy metrics-server -n kube-system
# 再次查看Pod:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep k8s-node01
kube-system calico-kube-controllers-8678987965-4j5bp 1/1 Running 2 (10m ago) 6d18h 172.16.85.200 k8s-node01 <none> <none>
kube-system calico-node-tdgh8 1/1 Running 1 (73m ago) 6d18h 192.168.200.64 k8s-node01 <none> <none>
kube-system kube-proxy-278gx 1/1 Running 0 7m22s 192.168.200.64 k8s-node01 <none> <none>
其它检查按需执行:
# 接下来驱逐节点
[root@k8s-master01 ~]# kubectl drain k8s-node01 --ignore-daemonsets
# 查看是否有Pending的服务(返回为空说明Pod均无问题):
[root@k8s-master01 ~]# kubectl get po -A | grep -i pending
# 查看非Running的Pod(返回为空说明Pod均无问题):
[root@k8s-master01 ~]# kubectl get po -A | grep -Ev '1/1|2/2|3/3|NAMESPACE'
# 升级工作节点
[root@k8s-node01 ~]# yum install -y kubeadm-'1.34*' kubelet-'1.34*' kubectl-'1.34*' --disableexcludes=kubernetes
# 重启kubelet:
[root@k8s-node01 ~]# systemctl daemon-reload
[root@k8s-node01 ~]# systemctl restart kubelet
# 升级后,去掉drain和污点:
[root@k8s-master01 ~]# kubectl uncordon k8s-node01
[root@k8s-master01 ~]# kubectl taint node k8s-node01 upgrade-
# 查看状态:
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 6d19h v1.34.1
k8s-master02 Ready control-plane 6d19h v1.34.1
k8s-master03 Ready control-plane 6d19h v1.34.1
k8s-node01 Ready <none> 6d19h v1.34.1
k8s-node02 Ready <none> 65m v1.34.1
# 其它节点同样操作步骤
##附录:如何是真正生产可用的集群?
1、节点均正常**(节点的状态全是Ready)**
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 6d19h v1.34.1
k8s-master02 Ready control-plane 6d19h v1.34.1
k8s-master03 Ready control-plane 6d19h v1.34.1
k8s-node01 Ready <none> 6d19h v1.34.1
k8s-node02 Ready <none> 65m v1.34.1
2、Pod 均正常**(Pod的状态全是Running,READY前后的数字都是一致的,RESTARTS(重启)的次数没有增加)**
[root@k8s-master01 ~]# kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
....
kube-system etcd-k8s-master01 1/1 Running 2 (23m ago) 6d17h
kube-system etcd-k8s-master02 1/1 Running 2 (23m ago) 6d17h
kube-system etcd-k8s-master03 1/1 Running 2 (23m ago) 6d17h
kube-system kube-scheduler-k8s-master01 1/1 Running 2 (23m ago) 6d17h
kube-system kube-scheduler-k8s-master02 1/1 Running 2 (23m ago) 6d17h
kube-system kube-scheduler-k8s-master03 1/1 Running 2 (23m ago) 6d17h
kube-system metrics-server-7d9d8df576-zzq9j 1/1 Running 6 (20m ago) 6d16h
....
3、集群网段无任何冲突**(svc网段10.96,node网段192.168,pod网段172.16)**
[root@k8s-master01 ~]# kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d17h
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 6d17h
kube-system metrics-server ClusterIP 10.96.87.203 <none> 443/TCP 6d16h
kubernetes-dashboard dashboard-metrics-scraper ClusterIP 10.96.128.108 <none> 8000/TCP 6d16h
kubernetes-dashboard kubernetes-dashboard NodePort 10.96.119.44 <none> 443:32506/TCP 6d16h
[root@k8s-master01 ~]# kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master01 Ready control-plane 6d17h v1.33.5 192.168.200.61 <none> Rocky Linux 9.6 (Blue Onyx) 5.14.0-570.49.1.el9_6.x86_64 containerd://1.7.28
k8s-master02 Ready control-plane 6d17h v1.33.5 192.168.200.62 <none> Rocky Linux 9.6 (Blue Onyx) 5.14.0-570.49.1.el9_6.x86_64 containerd://1.7.28
k8s-master03 Ready control-plane 6d17h v1.33.5 192.168.200.63 <none> Rocky Linux 9.6 (Blue Onyx) 5.14.0-570.49.1.el9_6.x86_64 containerd://1.7.28
k8s-node01 Ready <none> 6d17h v1.33.5 192.168.200.64 <none> Rocky Linux 9.6 (Blue Onyx) 5.14.0-570.49.1.el9_6.x86_64 containerd://1.7.28
k8s-node02 Ready <none> 6d17h v1.33.5 192.168.200.65 <none> Rocky Linux 9.6 (Blue Onyx) 5.14.0-570.49.1.el9_6.x86_64 containerd://1.7.28
[root@k8s-master01 ~]# kubectl get po -A -owide | grep coredns
kube-system coredns-746c97786-gz7hp 1/1 Running 2 (30m ago) 6d17h 172.16.85.201 k8s-node01 <none> <none>
kube-system coredns-746c97786-sf7mw 1/1 Running 2 (30m ago) 6d17h 172.16.85.200 k8s-node01 <none> <none>
4、能够正常创建资源
kubectl create deploy cluster-test --image=registry.cn-beijing.aliyuncs.com/dotbalo/debug-tools -- sleep 3600
5、Pod 必须能够解析 Service(同 namespace 和跨 namespace)
a) nslookup kubernetes
b) nslookup kube-dns.kube-system
6、每个节点都必须要能访问 Kubernetes 的 kubernetes svc 443 和 kube-dns 的 service 53
[root@k8s-master02 ~]# curl https://10.96.0.1:443 -k
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
"reason": "Forbidden",
"details": {},
"code": 403
}
[root@k8s-node02 ~]# curl http://10.96.0.10:53 -k
curl: (52) Empty reply from server
7、Pod 和 Pod 之间要能够正常通讯(同 namespace 和跨 namespace)
8、Pod 和 Pod 之间要能够正常通讯(同机器和跨机器)