k8s的安装
1.创建主机,设置ip,hostname,关闭firewalld,selinux,NetworkManager
|----|------------|----------------|
| 编号 | 主机名称 | ip |
| 1 | k8s-master | 192.168.118.66 |
| 2 | k8s-node01 | 192.168.118.77 |
| 3 | k8s-node02 | 192.168.118.88 |
2.设置主机之间的ssh免密
[root@k8s-master ~]# ssh-keygen
[root@k8s-master ~]# ssh-copy-id root@192.168.118.77
[root@k8s-master ~]# ssh-copy-id root@192.168.118.88
3.三个节点yum源配置
--docker
--k8s
--清空以及创建缓存 yum clean all && make makecach
--四个源镜像 aliyun ,epel, kubernetes,docker-ce
4.三个节点主机映射
[root@k8s-master ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.118.66 k8s-master
192.168.118.77 k8s-node01
192.168.118.88 k8s-node02
测试:主机之间能使用域名相互ping通
[root@k8s-master ~]# ping k8s-node01
PING k8s-node01 (192.168.118.77) 56(84) bytes of data.
64 bytes from k8s-node01 (192.168.118.77): icmp_seq=1 ttl=64 time=0.426 ms
64 bytes from k8s-node01 (192.168.118.77): icmp_seq=2 ttl=64 time=0.342 ms
5.三个节点安装常用软件
[root@k8s-master ~]# yum install wget jq psmisc vim net-tools telnet yum-utils device-mapper-persistent-data lvm2 git tree -y
6.三个节点关闭关闭防火墙, NetworkManager ,selinux ,SWAP虚拟分区
32 systemctl disable --now firewalld
33 systemctl disable --now NetworkManager
34 setenforce 0
35 vim /etc/selinux/config
37 swapoff -a && sysctl -w vm.swappiness=0
38 sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab
7.三个节点同步时间,同时设置计划任务
40 yum -y install ntpdate
41 ntpdate time2.aliyun.com
42 crontab -e
*/5 * * * * /usr/sbin/ntpdate time2.aliyun.com
[root@k8s-master ~]# crontab -l
*/5 * * * * /usr/sbin/ntpdate time2.aliyun.com
8.三个节点配置limit
43 ulimit -SHn 65535 # 单个进程可以打开的文件数量将被限制为 65535
45 vim /etc/security/limits.conf
# 末尾添加如下内容
* soft nofile 65536
* hard nofile 131072
* soft nproc 65535
* hard nproc 655350
* soft memlock unlimited
* hard memlock unlimited
9.下载yaml文件,从gitee上下载,配置的pod的yaml文件和docker-compose.yaml文件相似,等我的k8s架构搭建起来以后,在添加功能性pod的时候使用
46 cd /root/
47 git clone https://gitee.com/dukuan/k8s-ha-install.git
10.三个节点ipvs的配置
50 yum install ipvsadm ipset sysstat conntrack libseccomp -y
51 modprobe -- ip_vs
52 modprobe -- ip_vs_rr
53 modprobe -- ip_vs_wrr
54 modprobe -- ip_vs_sh
55 modprobe -- nf_conntrack
62 vim /etc/modules-load.d/ipvs.conf
# 在系统启动时加载下列 IPVS 和相关功能所需的模块
ip_vs # 负载均衡模块
ip_vs_lc # 用于实现基于连接数量的负载均衡算法
ip_vs_wlc # 用于实现带权重的最少连接算法的模块
ip_vs_rr # 负载均衡rr算法模块
ip_vs_wrr # 负载均衡wrr算法模块
ip_vs_lblc # 负载均衡算法,它结合了最少连接(LC)算法和基于偏置的轮询(Round Robin with Bias)算法
ip_vs_lblcr # 用于实现基于链路层拥塞状况的最少连接负载调度算法的模块
ip_vs_dh # 用于实现基于散列(Hashing)的负载均衡算法的模块
ip_vs_sh # 用于源端负载均衡的模块
ip_vs_fo # 用于实现基于本地服务的负载均衡算法的模块
ip_vs_nq # 用于实现NQ算法的模块
ip_vs_sed # 用于实现随机早期检测(Random Early Detection)算法的模块
ip_vs_ftp # 用于实现FTP服务的负载均衡模块
ip_vs_sh
nf_conntrack # 用于跟踪网络连接的状态的模块
ip_tables # 用于管理防护墙的机制
ip_set # 用于创建和管理IP集合的模块
xt_set # 用于处理IP数据包集合的模块,提供了与iptables等网络工具的接口
ipt_set # 用于处理iptables规则集合的模块
ipt_rpfilter # 用于实现路由反向路径过滤的模块
ipt_REJECT # iptables模块之一,用于将不符合规则的数据包拒绝,并返回特定的错误码
ipip # 用于实现IP隧道功能的模块,使得数据可以在两个网络之间进行传输
#查看已写入加载的模块
65 lsmod | grep -e ip_vs -e nf_conntrack
11.三个节点k8s的内核加载
66 vim /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1 # 控制网络桥接与iptables之间的网络转发行为
net.bridge.bridge-nf-call-ip6tables = 1 # 用于控制网络桥接(bridge)的IP6tables过滤规则。当该参数设置为1时,表示启用对网络桥接的IP6tables过滤规则
fs.may_detach_mounts = 1 # 用于控制文件系统是否允许分离挂载,1表示允许
net.ipv4.conf.all.route_localnet = 1 # 允许本地网络上的路由。设置为1表示允许,设置为0表示禁止。
vm.overcommit_memory=1 # 控制内存分配策略。设置为1表示允许内存过量分配,设置为0表示不允许。
vm.panic_on_oom=0 # 决定当系统遇到内存不足(OOM)时是否产生panic。设置为0表示不产生panic,设置为1表示产生panic。
fs.inotify.max_user_watches=89100 # inotify可以监视的文件和目录的最大数量。
fs.file-max=52706963 # 系统级别的文件描述符的最大数量。
fs.nr_open=52706963 # 单个进程可以打开的文件描述符的最大数量。
net.netfilter.nf_conntrack_max=2310720 # 网络连接跟踪表的最大大小。
net.ipv4.tcp_keepalive_time = 600 # TCP保活机制发送探测包的间隔时间(秒)。
net.ipv4.tcp_keepalive_probes = 3 # TCP保活机制发送探测包的最大次数。
net.ipv4.tcp_keepalive_intvl =15 # TCP保活机制在发送下一个探测包之前等待响应的时间(秒)。
net.ipv4.tcp_max_tw_buckets = 36000 # TCP TIME_WAIT状态的bucket数量。
net.ipv4.tcp_tw_reuse = 1 # 允许重用TIME_WAIT套接字。设置为1表示允许,设置为0表示不允许。
net.ipv4.tcp_max_orphans = 327680 # 系统中最大的孤套接字数量。
net.ipv4.tcp_orphan_retries = 3 # 系统尝试重新分配孤套接字的次数。
net.ipv4.tcp_syncookies = 1 # 用于防止SYN洪水攻击。设置为1表示启用SYN cookies,设置为0表示禁用。
net.ipv4.tcp_max_syn_backlog = 16384 # SYN连接请求队列的最大长度。
net.ipv4.ip_conntrack_max = 65536 # IP连接跟踪表的最大大小。
net.ipv4.tcp_max_syn_backlog = 16384 # 系统中最大的监听队列的长度。
net.ipv4.tcp_timestamps = 0 # 用于关闭TCP时间戳选项。
net.core.somaxconn = 16384 # 用于设置系统中最大的监听队列的长度
#保存后,所有节点重启,保证重启后内核依然加载
67 reboot
12.三个节点卸载podman,安装docker-ce docker-ce-cli containerd
[root@k8s-master ~]# yum remove -y podman runc containerd
[root@k8s-master ~]# yum install docker-ce docker-ce-cli containerd.io -y
13.三个节点配置containerd所需要的核心模块 overlay,br_nerfilter
[root@k8s-master ~]# cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
> overlay
> br_netfilter
> EOF
overlay
br_netfilter
[root@k8s-master ~]# modprobe overlay
[root@k8s-master ~]# modprobe br_netfilter
[root@k8s-master ~]# cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
> net.bridge.bridge-nf-call-iptables = 1
> net.ipv4.ip_forward = 1
> net.bridge.bridge-nf-call-ip6tables = 1
> EOF
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
[root@k8s-master ~]# sysctl --system
14.containerd配置文件
[root@k8s-master ~]# mkdir -p /etc/containerd
#读取containerd的配置并保存到/etc/containerd/config.toml
[root@k8s-master ~]# containerd config default | tee /etc/containerd/config.toml
[root@k8s-master ~]# vim /etc/containerd/config.toml
修改63行和127行
启动containerd并设置开机启动
[root@k8s-master ~]# systemctl enable --now containerd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/containerd.service to /usr/lib/systemd/system/containerd.service.
[root@k8s-master ~]# systemctl status containerd.service
● containerd.service - containerd container runtime
Loaded: loaded (/usr/lib/systemd/system/containerd.service; enabled; vendor preset: disabled)
Active: active (running) since 三 2024-09-11 11:40:38 CST; 52s ago
15.三个节点配置远程客户端访问
[root@k8s-master ~]# cat > /etc/crictl.yaml <<EOF
> runtime-endpoint: unix:///run/containerd/containerd.sock
> image-endpoint:unix:///run/containerd/containerd.sock
> timeout: 10
> debug: false
> EOF
16.三节点安装kubernetes组件
86 yum -y install kubeadm-1.28* kubectl-1.28* kubelet-1.28*
87 yum list installed | grep kube
88 systemctl daemon-reload
89 systemctl enable --now kubelet.service
90 systemctl status kubelet.service
--异常:
如果Kubelet无法正常启动,检查swap是否已经取消虚拟分区,查看/var/log/message如果是没
有/var/lib/kublet/cofig.yaml文件,可能需要重新安装
91 yum -y remove kubelet-1.28*
92 yum -y install kubelet-1.28*
93 systemctl daemon-reload
94 systemctl enable --now kubelet.service
95 systemctl status kubelet.service
kubeadm依赖与kubelet:
96 yum -y install kubeadm-1.28*
[root@k8s-master ~]# systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Active: active (running) since 三 2024-09-11 14:25:11 CST; 16s ago
#kubelet端口是10248,10250,10255三个端口
[root@k8s-master ~]# netstat -lnput | grep kubelet
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 11597/kubelet
tcp6 0 0 :::10250 :::* LISTEN 11597/kubelet
tcp6 0 0 :::10255 :::* LISTEN 11597/kubelet
只对k8s-master操作
17.初始化
--拉取镜像
[root@k8s-master ~]# vim kubeadm-config.yaml
####修改文件里的IP地址为本机的IP(k8s-master)
apiVersion: kubeadm.k8s.io/v1beta3 # 指定Kubernetes配置文件的版本,使用的是kubeadm API的v1beta3版本
bootstrapTokens: # 定义bootstrap tokens的信息。这些tokens用于在Kubernetes集群初始化过程中进行身份验证
- groups: # 定义了与此token关联的组
- system:bootstrappers:kubeadm:default-node-token
token: 7t2weq.bjbawausm0jaxury # bootstrap token的值
ttl: 24h0m0s # token的生存时间,这里设置为24小时
usages: # 定义token的用途
- signing # 数字签名
- authentication # 身份验证
kind: InitConfiguration # 指定配置对象的类型,InitConfiguration:表示这是一个初始化配置
localAPIEndpoint: # 定义本地API端点的地址和端口
advertiseAddress: 192.168.118.66
bindPort: 6443
nodeRegistration: # 定义节点注册时的配置
criSocket: unix:///var/run/containerd/containerd.sock # 容器运行时(CRI)的套接字路径
name: k8s-master # 节点的名称
taints: # 标记
- effect: NoSchedule # 免调度节点
key: node-role.kubernetes.io/control-plane # 该节点为控制节点
---
apiServer: # 定义了API服务器的配置
certSANs: # 为API服务器指定了附加的证书主体名称(SAN),指定IP即可
- 192.168.118.66
timeoutForControlPlane: 4m0s # 控制平面的超时时间,这里设置为4分钟
apiVersion: kubeadm.k8s.io/v1beta3 # 指定API Server版本
certificatesDir: /etc/kubernetes/pki # 指定了证书的存储目录
clusterName: kubernetes # 定义了集群的名称为"kubernetes"
controlPlaneEndpoint: 192.168.118.66:6443 # 定义了控制节点的地址和端口
controllerManager: {} # 控制器管理器的配置,为空表示使用默认配置
etcd: # 定义了etcd的配置
local: # 本地etcd实例
dataDir: /var/lib/etcd # 数据目录
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers # 指定了Kubernetes使用的镜像仓库的地址,阿里云的镜像仓库。
kind: ClusterConfiguration # 指定了配置对象的类型,ClusterConfiguration:表示这是一个集群配置
kubernetesVersion: v1.28.2 # 指定了kubernetes的版本
networking: # 定义了kubernetes集群网络设置
dnsDomain: cluster.local # 定义了集群的DNS域为:cluster.local
podSubnet: 172.16.0.0/16 # 定义了Pod的子网
serviceSubnet: 10.96.0.0/16 # 定义了服务的子网
scheduler: {} # 使用默认的调度器行为
# 将旧的kubeadm配置文件转换为新的格式
[root@k8s-master ~]# kubeadm config migrate --old-config kubeadm-config.yaml --new-config new.yaml
[root@k8s-master ~]# ls
anaconda-ks.cfg k8s-ha-install kubeadm-config.yaml new.yaml
104 kubeadm config images pull --config new.yaml
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.28.2
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.28.2
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.28.2
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.28.2
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.9
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.9-0
[config/images] Pulled registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.10.1
--初始化
[root@k8s-master ~]# kubeadm init --config /root/new.yaml --upload-certs
[init] Using Kubernetes version: v1.28.2
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-10250]: Port 10250 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
[root@k8s-master ~]# systemctl stop kubelet.service
Warning: kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units.
[root@k8s-master ~]# kubeadm init --config /root/new.yaml --upload-certs
kubeadm join 192.168.118.66:6443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:edc1d608d110c230dfb234262534921f2d3855b5d33d662aef79a8684c3896f2
--保存toke
保存token,其他人要用我们的集群,就用这个,如果又初始化后,token会变
[root@k8s-master ~]# vim token
kubeadm join 192.168.118.66:6443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:edc1d608d110c230dfb234262534921f2d3855b5d33d662aef79a8684c3896f2
配置故障
--主机配置2核2G
--echo 1 > /proc/net/ipv4/ip_forward
--kubelet无法启动
--swap虚拟分区没关
--没有配置文件
--查看日志/var/log/message
18.将工作节点加入到集群
--在k8s-node01和k8s-node02上,先systemctl stop kubelet
[root@k8s-node01 ~]# kubeadm join 192.168.118.66:6443 --token 7t2weq.bjbawausm0jaxury \
> --discovery-token-ca-cert-hash sha256:edc1d608d110c230dfb234262534921f2d3855b5d33d662aef79a8684c3896f2
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-10250]: Port 10250 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
您在 /var/spool/mail/root 中有新邮件
[root@k8s-node01 ~]# systemctl stop kubelet.service
Warning: kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units.
[root@k8s-node01 ~]# kubeadm join 192.168.118.66:6443 --token 7t2weq.bjbawausm0jaxury --discovery-token-ca-cert-hash sha256:edc1d608d110c230dfb234262534921f2d3855b5d33d662aef79a8684c3896f2
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
--添加不成功
--kubelet没有stop
--ip转发没有
--token重启初始化或者生成token
--node中containerd是否正常
--查看节点以及pod状态
#查看节点 查看在集群中的主机的状态
[root@k8s-master ~]# kubectl get nodes
E0911 15:40:13.584913 12962 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
[root@k8s-master ~]# export KUBECONFIG=/etc/kubernetes/admin.conf
[root@k8s-master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master NotReady control-plane 47m v1.28.2
k8s-node01 NotReady <none> 3m16s v1.28.2
k8s-node02 NotReady <none> 2m35s v1.28.2
[root@k8s-master ~]# vim .bashrc
#查看所有的pod状态
[root@k8s-master ~]# kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6554b8b87f-thzlw 0/1 Pending 0 53m
kube-system coredns-6554b8b87f-zdhh9 0/1 Pending 0 53m
kube-system etcd-k8s-master 1/1 Running 0 53m
kube-system kube-apiserver-k8s-master 1/1 Running 0 53m
kube-system kube-controller-manager-k8s-master 1/1 Running 0 53m
kube-system kube-proxy-8jk8f 1/1 Running 0 8m40s
kube-system kube-proxy-dvfx4 1/1 Running 0 9m21s
kube-system kube-proxy-g226c 1/1 Running 0 53m
kube-system kube-scheduler-k8s-master 1/1 Running 0 53m
#查看pod完整信息
[root@k8s-master ~]# kubectl get po -Aowide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-6554b8b87f-thzlw 0/1 Pending 0 56m <none> <none> <none> <none>
kube-system coredns-6554b8b87f-zdhh9 0/1 Pending 0 56m <none> <none> <none> <none>
kube-system etcd-k8s-master 1/1 Running 0 56m 192.168.118.66 k8s-master <none> <none>
kube-system kube-apiserver-k8s-master 1/1 Running 0 56m 192.168.118.66 k8s-master <none> <none>
kube-system kube-controller-manager-k8s-master 1/1 Running 0 56m 192.168.118.66 k8s-master <none> <none>
kube-system kube-proxy-8jk8f 1/1 Running 0 11m 192.168.118.88 k8s-node02 <none> <none>
kube-system kube-proxy-dvfx4 1/1 Running 0 12m 192.168.118.77 k8s-node01 <none> <none>
kube-system kube-proxy-g226c 1/1 Running 0 56m 192.168.118.66 k8s-master <none> <none>
kube-system kube-scheduler-k8s-master 1/1 Running 0 56m 192.168.118.66 k8s-master <none> <none>
#状态(status)
pending:挂起 ,当前pod没有工作
running: 运行中 ,当前pod正常工作
ContainerCreating: 正在创建容器
19.部署calico的pod
找到配置的calico.yaml文件
cd /root/k8s-ha-install && git checkout manual-installation-v1.28.x && cd calico/
修改配置文件,将文件中的POD_CIDR替换成 172.16.0.0/16
- name:
CALICO_IPV4POOL_CIDR
value: "172.16.0.0/16"
#创建pod
kubectl apply -f calico.yaml