好久不见呀!今天给大家整点干货尝尝(其实是自己的总结),主打的就是全程无尿点。
Kubernetes 多主多从集群部署完整文档
1. 机器列表
PS: master,lb,nfs机器均为CentOS 7,其他为Ubuntu 22.04 LTS
机器名称 | IP地址 | 备注 |
---|---|---|
lb1 | 192.168.1.120 | 负载均衡器1 |
lb2 | 192.168.1.119 | 负载均衡器2 |
master-a | 192.168.1.74 | 主节点1 |
master-b | 192.168.1.93 | 主节点2 |
master-c | 192.168.1.107 | 主节点3 |
node01 | 192.168.1.13 | 工作节点1 |
master01 | 192.168.1.53 | 原单主集群节点(现为工作节点) |
vip | 192.168.1.150 | 虚拟IP地址 |
2. 负载均衡器配置
2.1 lb1配置
2.1.1 基础环境配置
bash
# 备份原有yum源
cp /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bak
# 设置阿里云yum源
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
# 清理并重建yum缓存
yum clean all
yum makecache
# 设置主机名
hostnamectl set-hostname lb1
# 设置时区
timedatectl set-timezone Asia/Shanghai
# 关闭并禁用防火墙
systemctl stop firewalld
systemctl disable firewalld
# 关闭SELinux
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config
2.1.2 安装必要组件
bash
# 安装基础工具
yum install -y curl socat conntrack ebtables ipset ipvsadm
# 安装负载均衡组件
yum install -y keepalived haproxy psmisc
2.1.3 配置HAProxy
编辑配置文件/etc/haproxy/haproxy.cfg
:
config
global
log /dev/log local0 warning
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
defaults
log global
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
frontend kube-apiserver
bind *:6443
mode tcp
option tcplog
default_backend kube-apiserver
backend kube-apiserver
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server kube-apiserver-1 192.168.1.74:6443 check
server kube-apiserver-2 192.168.1.93:6443 check
server kube-apiserver-3 192.168.1.107:6443 check
启动HAProxy服务:
bash
systemctl restart haproxy
systemctl enable haproxy
2.1.4 配置Keepalived
编辑配置文件/etc/keepalived/keepalived.conf
:
config
global_defs {
notification_email {
}
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance haproxy-vip {
state BACKUP
priority 100
interface ens192 # 根据实际网卡名称修改
virtual_router_id 60
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
unicast_src_ip 192.168.1.120 # 本机IP
unicast_peer {
192.168.1.119 # 对端lb2的IP
}
virtual_ipaddress {
192.168.1.150/24 # VIP地址
}
track_script {
chk_haproxy
}
}
启动Keepalived服务:
bash
systemctl restart keepalived
systemctl enable keepalived
2.2 lb2配置
lb2的配置与lb1基本相同,主要区别在于:
- 主机名设置为lb2
- Keepalived配置中的unicast_src_ip改为192.168.1.119
- Keepalived配置中的unicast_peer改为192.168.1.120
完整配置步骤参考lb1的配置。
详细配置
bash
# 备份原有yum源
cp /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bak
# 设置阿里云yum源
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
# 清理并重建yum缓存
yum clean all
yum makecache
# 设置主机名
hostnamectl set-hostname lb2
# 设置时区
timedatectl set-timezone Asia/Shanghai
# 关闭并禁用防火墙
systemctl stop firewalld
systemctl disable firewalld
# 关闭SELinux
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config
2.2.2 安装必要组件
bash
# 安装基础工具
yum install -y curl socat conntrack ebtables ipset ipvsadm
# 安装负载均衡组件
yum install -y keepalived haproxy psmisc
2.2.3 配置HAProxy
编辑配置文件/etc/haproxy/haproxy.cfg
:
config
global
log /dev/log local0 warning
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
defaults
log global
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
frontend kube-apiserver
bind *:6443
mode tcp
option tcplog
default_backend kube-apiserver
backend kube-apiserver
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server kube-apiserver-1 192.268.1.74:6443 check
server kube-apiserver-2 192.268.1.93:6443 check
server kube-apiserver-3 192.268.1.107:6443 check
启动HAProxy服务:
bash
systemctl restart haproxy
systemctl enable haproxy
2.2.4 配置Keepalived
编辑配置文件/etc/keepalived/keepalived.conf
:
config
global_defs {
notification_email {
}
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance haproxy-vip {
state BACKUP
priority 100
interface ens192 # 根据实际网卡名称修改
virtual_router_id 60
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
unicast_src_ip 192.268.1.119 # 本机IP
unicast_peer {
192.268.1.120 # 对端lb2的IP
}
virtual_ipaddress {
192.268.1.150/24 # VIP地址
}
track_script {
chk_haproxy
}
}
启动Keepalived服务:
bash
systemctl restart keepalived
systemctl enable keepalived
3. NFS服务器配置
bash
# 解压NFS安装包
tar zxf nfs/nfs.tar.gz
# 安装NFS服务
yum -y localinstall nfs-rpm/*.rpm
# 配置NFS共享目录
cat > /etc/exports << EOF
/nfs-data/data *(rw,sync,no_root_squash,no_subtree_check)
EOF
# 启动NFS服务
systemctl start nfs
systemctl enable nfs
systemctl restart nfs
# 在其他节点安装NFS客户端工具
yum install -y nfs-utils
4. Master节点配置
4.1 master-a配置
4.1.1 基础环境配置
bash
# 设置阿里云yum源
cp /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bak
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
yum clean all
yum makecache
# 设置主机名
hostnamectl set-hostname master-a
# 设置时区
timedatectl set-timezone Asia/Shanghai
# 关闭防火墙和SELinux
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config
# 配置hosts文件
cat >> /etc/hosts << EOF
192.168.1.53 master01
192.168.1.13 node01
192.168.1.81 ai3
192.168.1.74 master-a
192.168.1.93 master-b
192.168.1.107 master-c
EOF
# 配置时间同步
yum install -y ntpdate
ntpdate cn.pool.ntp.org
echo "*/5 * * * * root /usr/sbin/ntpdate cn.pool.ntp.org &>/dev/null" >> /etc/crontab
4.1.2 安装Docker
bash
# 解压Docker安装包
tar xf docker-20.10.23.tgz -C /usr/local
# 复制Docker二进制文件
cp /usr/local/docker/* /usr/bin/
# 创建Docker服务文件
cat > /usr/lib/systemd/system/docker.service <<EOF
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
ExecStart=/usr/bin/dockerd
ExecReload=/bin/kill -s HUP \$MAINPID
LimitNOFILE=infinity
LimitNPROC=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
EOF
# 配置Docker
mkdir -p /etc/docker
cat > /etc/docker/daemon.json << EOF
{
"insecure-registries":["192.168.1.13:5000"],
"exec-opts": ["native.cgroupdriver=systemd"],
"data-root": "/home/docker",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
EOF
# 启动Docker服务
systemctl daemon-reload
systemctl start docker
systemctl enable docker
docker -v
4.1.3 安装Kubernetes组件
bash
# 解压Kubernetes安装包
tar -xvf k8s-v1.23.16.tar
# 关闭swap
swapoff -a
sed -i 's/.*swap.*/#&/g' /etc/fstab
# 配置内核参数
cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
vm.swappiness = 0
net.ipv6.conf.all.disable_ipv6 = 0
net.ipv6.conf.default.disable_ipv6 = 0
net.ipv6.conf.lo.disable_ipv6 = 0
net.ipv6.conf.all.forwarding = 1
EOF
sysctl -p /etc/sysctl.d/k8s.conf
# 允许转发
iptables -P FORWARD ACCEPT
# 安装Kubernetes RPM包
yum -y localinstall k8s-rpm/*.rpm
安装nfs
mkdir tmp
tar zxf nfs/nfs.tar.gz -C tmp
yum -y localinstall tmp/nfs-rpm/*.rpm
systemctl start nfs
systemctl enable nfs
4.1.4 初始化Kubernetes集群
创建kubeadm配置文件kubeadm-config.yaml
:
yaml
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.1.74
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
imagePullPolicy: IfNotPresent
name: master-a
taints: null
---
apiServer:
certSANs:
- master-a
- master-b
- master-c
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 192.168.1.150:6443
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: 192.168.1.13:5000
kind: ClusterConfiguration
kubernetesVersion: 1.23.17
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
podSubnet: 10.244.0.0/16
scheduler: {}
初始化集群:
bash
kubeadm init --config kubeadm-config.yaml --upload-certs
配置kubectl:
bash
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
安装网络插件:
bash
kubectl apply -f flannel/kube-flannel.yml
打包并分发集群配置文件:
bash
cd /etc/kubernetes
tar zcf /root/k8s_conf.tar.gz pki/ca.crt pki/ca.key pki/sa.key pki/sa.pub pki/front-proxy-ca.crt pki/front-proxy-ca.key pki/etcd/ca.crt pki/etcd/ca.key admin.conf
scp /root/k8s_conf.tar.gz [email protected]:~/
scp /root/k8s_conf.tar.gz [email protected]:~/
生成加入集群命令:
bash
kubeadm token create --print-join-command
4.2 master-b配置
master-b的配置与master-a类似,主要区别在于:
- 主机名设置为master-b
- 初始化时使用从master-a获取的配置文件
- 使用加入命令而非初始化命令
具体步骤:
bash
# 设置阿里云yum源
cp /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bak
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
yum clean all
yum makecache
# 设置主机名
hostnamectl set-hostname master-a
# 设置时区
timedatectl set-timezone Asia/Shanghai
# 关闭防火墙和SELinux
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config
# 配置hosts文件
cat >> /etc/hosts << EOF
192.168.1.53 master01
192.168.1.13 node01
192.168.1.81 ai3
192.168.1.74 master-a
192.168.1.93 master-b
192.168.1.107 master-c
EOF
# 配置时间同步
yum install -y ntpdate
ntpdate time.windows.com
4.2.2 安装Docker
bash
# 解压Docker安装包
tar xf docker-20.10.23.tgz -C /usr/local
# 复制Docker二进制文件
cp /usr/local/docker/* /usr/bin/
# 创建Docker服务文件
cat > /usr/lib/systemd/system/docker.service <<EOF
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
ExecStart=/usr/bin/dockerd
ExecReload=/bin/kill -s HUP \$MAINPID
LimitNOFILE=infinity
LimitNPROC=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
EOF
# 配置Docker
mkdir -p /etc/docker
cat > /etc/docker/daemon.json << EOF
{
"insecure-registries":["192.168.1.13:5000"],
"exec-opts": ["native.cgroupdriver=systemd"],
"data-root": "/home/docker",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
EOF
# 启动Docker服务
systemctl daemon-reload
systemctl start docker
systemctl enable docker
docker -v
4.2.3 安装Kubernetes组件
bash
# 解压Kubernetes安装包
tar -xvf k8s-v1.23.16.tar
# master-a配置解压到k8s目录
mkdir -p /etc/kubernetes
tar zxf /root/k8s_conf.tar.gz -C /etc/kubernetes/
# 关闭swap
swapoff -a
sed -i 's/.*swap.*/#&/g' /etc/fstab
# 配置内核参数
cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
vm.swappiness = 0
net.ipv6.conf.all.disable_ipv6 = 0
net.ipv6.conf.default.disable_ipv6 = 0
net.ipv6.conf.lo.disable_ipv6 = 0
net.ipv6.conf.all.forwarding = 1
EOF
sysctl -p /etc/sysctl.d/k8s.conf
# 允许转发
iptables -P FORWARD ACCEPT
# 安装Kubernetes RPM包
yum -y localinstall k8s-rpm/*.rpm
kubeadm join 192.168.1.150:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:8e13ce8a9e6ce68c4ba9b6b01ca98cff62a61d4d6c9b6063bd6b37aca19f7890 --control-plane
systemctl restart kubelet
systemctl status kubelet
查看端口占用
lsof -i:10250
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
安装nfs
mkdir tmp
tar zxf nfs/nfs.tar.gz -C tmp
yum -y localinstall tmp/nfs-rpm/*.rpm
systemctl start nfs
systemctl enable nfs
外网安装
yum install nfs-utils
4.3 master-c配置
master-c的配置与master-b完全相同。
5. 工作节点配置
5.1 Ubuntu节点配置
bash
# 更新系统并安装必要工具
apt-get update && apt-get install -y apt-transport-https ca-certificates curl software-properties-common
# 添加Docker官方GPG密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
# 添加Docker仓库
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
# 安装Docker
apt-get update && apt-get install -y docker-ce=5:20.10.23~3-0~ubuntu-$(lsb_release -cs) docker-ce-cli=5:20.10.23~3-0~ubuntu-$(lsb_release -cs) containerd.io
# 配置Docker
mkdir -p /etc/docker
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
# 重启Docker
systemctl daemon-reload
systemctl restart docker
systemctl enable docker
# 安装kubeadm、kubelet和kubectl
apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet=1.23.16-00 kubeadm=1.23.16-00 kubectl=1.23.16-00
# 关闭swap
swapoff -a
sed -ri 's/.*swap.*/#&/' /etc/fstab
# 关闭防火墙
ufw disable
# 加入集群
kubeadm join 192.168.1.150:6443 --token 661ic1.vgfsbtnxte96nldg \
--discovery-token-ca-cert-hash sha256:8e13ce8a9e6ce68c4ba9b6b01ca98cff62a61d4d6c9b6063bd6b37aca19f7890
安装nfs
apt-get install nfs-common
6. 常见问题解决方案
6.1 kubelet启动超时
error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition To see the stack trace of this error execute with --v=5 or higher
bash
kubeadm reset -f
docker rm -f $(docker ps -a -q)
rm -rf /var/lib/cni/
systemctl daemon-reload
systemctl restart kubelet
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
6.2 CNI网络插件问题
~~
(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "f4aac82f1a810b98057c8bb838deec809eb0750d703abcfb4a505ddcfb8406cd" network for pod "eip-nfs-nfs-client-6478c978c9-tqxld": networkPlugin cni failed to set up pod "eip-nfs-nfs-client-6478c978c9-tqxld_kube-system" network: failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.4.1/24
~~
bash
rm -rf /etc/cni
ip link set cni0 down
ip link delete cni0
6.3 HAProxy端口绑定失败
高可用集群安装haproxy启动报告"Starting frontend api: cannot bind socket [0.0.0.0:6443]"的错误,可以执行以下命令
bash
setsebool -P haproxy_connect_any=1
systemctl restart haproxy
6.4 Kubernetes证书过期
6.4.1 检查证书过期情况
bash
kubeadm certs check-expiration
6.4.2 更新所有证书
bash
kubeadm certs renew all
6.4.3 更新kubeconfig文件
bash
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
6.4.4 重启相关服务
bash
systemctl restart kubelet
docker ps | grep -E 'k8s_kube-apiserver|k8s_kube-controller-manager|k8s_kube-scheduler|k8s_etcd_etcd' | awk '{print $1}' | xargs docker restart
7. 验证集群状态
在所有master节点上执行:
bash
kubectl get nodes
kubectl get pods --all-namespaces
kubectl get cs
预期输出应显示所有节点状态为Ready,所有系统Pod运行正常,组件状态均为Healthy。
8. H3C路由器配置VIP
配置路径
高级选项 --->> 策略路由
详细配置参数
配置项 | 参数值 | 说明 |
---|---|---|
接口 | VLAN1 | 指定策略应用的VLAN接口 |
协议类型 | IP | 选择IP协议 |
源IP地址段 | 192.168.1.150-192.168.1.150 | 精确匹配单个源IP |
目的IP地址段 | 192.168.1.119-192.168.1.120 | 匹配目标IP范围 |
源端口 | 空 | 不限制源端口 |
目的端口 | 空 | 不限制目的端口 |
生效时间 | 空 | 全天生效 |
优先级 | 自动 | 由系统自动分配优先级 |
出接口 | WAN1 | 指定流量出口 |
是否启用 | 启用 | 启用该策略 |
描述 | LB-VIP,代理的是kube-api:6443端口 | 策略用途说明 |
配置说明
- 此策略将来自
192.168.1.150
访问192.168.1.119-120
的流量强制从WAN1接口转发 - 适用于k8s API Server(6443端口)的VIP代理场景
- 保持源/目的端口为空表示匹配所有端口流量
- 优先级自动分配可确保策略正常排序
注意事项
- 确保WAN1接口已正确配置
- 检查VLAN1接口状态是否UP
- 配置后建议测试连通性