k8s--部署k8s集群--控制平面节点

环境

Vmware虚拟机，使用Ubuntu 24.04.1 LTS无桌面操作系统。
新建虚拟机参考

注意：在配置网络的时候要正确，不然后面下载系统组件会失败。

选择Docker Engine作为容器运行时。安装docker

关闭防火墙

bash 复制代码

sudo ufw disable

临时关闭 swap

bash 复制代码

sudo swapoff -a

更新节点的本地域名IP解析
在文件/etc/hosts文件加上

bash 复制代码

192.168.40.133  ubuntu-master

安装docker-这里使用apt仓库安装
官网参考

卸载旧的docker

bash 复制代码

 for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done

更新docker的apt仓库

bash 复制代码

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

安装docker包

bash 复制代码

 sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

到这docker安装完成

配置docker镜像

修改/etc/docker/daemon.json文件，如果没有就创建

bash 复制代码

{

 "registry-mirrors": [
  "https://registry.docker-cn.com",
    "http://hub-mirror.c.163.com",
    "https://docker.mirrors.ustc.edu.cn",
    "https://docker.m.daocloud.io",
    "https://dockerproxy.com",
    "https://docker.mirrors.ustc.edu.cn",
    "https://docker.nju.edu.cn",
    "https://mirror.baidubce.com",
    "https://mirror.iscas.ac.cn",
    "https://registry.cn-hangzhou.al"
 ],
  "exec-opts": ["native.cgroupdriver=systemd"]
}

重启docker

bash 复制代码

systemctl daemon-reload
 
systemctl restart docker.service
 //设置开机自启
systemctl enable docker.service

//检测docker的cgroup,这里使用systemed
docker info | grep "Cgroup Driver"

安装 cri-dockerd

使用 cri-dockerd 适配器来将 Docker Engine 与 Kubernetes 集成。

cri-dockerd 下载地址

bash 复制代码

roy@ubuntu-master:~$ sudo dpkg -i cri-dockerd_0.3.16.3-0.debian-bookworm_amd64.deb
(Reading database ... 84116 files and directories currently installed.)
Preparing to unpack cri-dockerd_0.3.16.3-0.debian-bookworm_amd64.deb ...
Unpacking cri-dockerd (0.3.16~3-0~debian-bookworm) over (0.3.16~3-0~debian-bookworm) ...
Setting up cri-dockerd (0.3.16~3-0~debian-bookworm) ...

注意文件/usr/lib/systemd/system/cri-docker.service。在主节点上我没有修改，主要是镜像问题ExecStart配置项。这里没有修改，因为用户的是k8s社区配置，到时工作节点使用的是阿里云的镜像配置。

bash 复制代码

roy@ubuntu-master:~$ cat /usr/lib/systemd/system/cri-docker.service
[Unit]
Description=CRI Interface for Docker Application Container Engine
Documentation=https://docs.mirantis.com
After=network-online.target firewalld.service docker.service
Wants=network-online.target
Requires=cri-docker.socket

[Service]
Type=notify
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd://
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target

bash 复制代码

roy@ubuntu-master:~$ sudo systemctl daemon-reload
roy@ubuntu-master:~$ sudo systemctl start cri-dockerd
roy@ubuntu-master:~$ sudo systemctl enable cri-dockerd

安装 kubeadm,kubelet,kubectl

官网参考

bash 复制代码

roy@ubuntu-master:~$ sudo apt-get update

roy@ubuntu-master:~$ sudo apt-get install -y apt-transport-https ca-certificates curl gpg

roy@ubuntu-master:~$ curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg


roy@ubuntu-master:~$ echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list


roy@ubuntu-master:~$ sudo apt-get update

roy@ubuntu-master:~$ sudo apt-get install -y kubelet kubeadm kubectl
//锁定版本
roy@ubuntu-master:~$ sudo apt-mark hold kubelet kubeadm kubectl

配置kubelet

bash 复制代码

sudo vi /etc/sysconfig/kubelet 

KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"

bash 复制代码

设置为开机自启

systemctl enable kubelet

准备使用配置文件初始化控制平面节点

bash 复制代码

roy@ubuntu-master:~$ kubeadm config print init-defaults > kubeadm-config.yaml

bash 复制代码

roy@ubuntu-master:~$ cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta4
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.40.133
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  imagePullSerial: true
  name: node
  taints: null
timeouts:
  controlPlaneComponentHealthCheck: 4m0s
  discovery: 5m0s
  etcdAPICall: 2m0s
  kubeletHealthCheck: 4m0s
  kubernetesAPICall: 1m0s
  tlsBootstrap: 5m0s
  upgradeManifests: 5m0s
---
apiServer: {}
apiVersion: kubeadm.k8s.io/v1beta4
caCertificateValidityPeriod: 87600h0m0s
certificateValidityPeriod: 8760h0m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
encryptionAlgorithm: RSA-2048
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: 1.32.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
proxy: {}
scheduler: {}

主要修改：

advertiseAddress：改成自己控制平面所在主机的IP
criSocket（根据实际情况修改）： unix:///var/run/cri-dockerd.sock
imageRepository: registry.aliyuncs.com/google_containers
podSubnet: 10.244.0.0/16

查看需要哪些镜像

bash 复制代码

roy@ubuntu-master:~$ kubeadm config images list --config kubeadm-config.yaml
registry.aliyuncs.com/google_containers/kube-apiserver:v1.32.0
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.32.0
registry.aliyuncs.com/google_containers/kube-scheduler:v1.32.0
registry.aliyuncs.com/google_containers/kube-proxy:v1.32.0
registry.aliyuncs.com/google_containers/coredns:v1.11.3
registry.aliyuncs.com/google_containers/pause:3.10
registry.aliyuncs.com/google_containers/etcd:3.5.16-0

可以提前拉去相关镜像，加快初始化速度

bash 复制代码

roy@ubuntu-master:~$ kubeadm config images list --config kubeadm-config.yaml
registry.aliyuncs.com/google_containers/kube-apiserver:v1.32.0
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.32.0
registry.aliyuncs.com/google_containers/kube-scheduler:v1.32.0
registry.aliyuncs.com/google_containers/kube-proxy:v1.32.0
registry.aliyuncs.com/google_containers/coredns:v1.11.3
registry.aliyuncs.com/google_containers/pause:3.10
registry.aliyuncs.com/google_containers/etcd:3.5.16-0

初始化节点

bash 复制代码

roy@ubuntu-master:~$ sudo kubeadm init --config kubeadm-config.yaml
[init] Using Kubernetes version: v1.32.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
W0126 15:51:04.907789   14073 checks.go:846] detected that the sandbox image "registry.aliyuncs.com/google_containers/pause:3.8" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "registry.aliyuncs.com/google_containers/pause:3.10" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local ubuntu-master] and IPs [10.96.0.1 192.168.40.133]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost ubuntu-master] and IPs [192.168.40.133 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost ubuntu-master] and IPs [192.168.40.133 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.557237ms
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is healthy after 6.00320591s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node ubuntu-master as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node ubuntu-master as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: abcdef.0123456789abcdef
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.40.133:6443 --token abcdef.0123456789abcdef \
        --discovery-token-ca-cert-hash sha256:58699bbeb3655f284b2ac1830bff7ddf170e13bef7bc36737fdca906a3776585

bash 复制代码

roy@ubuntu-master:~$ sudo mkdir -p $HOME/.kube
roy@ubuntu-master:~$   sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
roy@ubuntu-master:~$   sudo chown $(id -u):$(id -g) $HOME/.kube/config

到这里其实节点已经初始化成功。但是可以看见

bash 复制代码

kubectl get nodes

显示节点是NotReady，也还可以看看pod的运行情况，还需要安装网络插件。

这里我使用flannel
github--flannel

flannel下载地址

bash 复制代码

roy@ubuntu-master:~$ tar -xzvf flannel-v0.26.3-linux-amd64.tar.gz
flanneld
mk-docker-opts.sh
README.md

部署pod

bash 复制代码

roy@ubuntu-master:~$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
namespace/kube-flannel created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

这个方式很可能网络问题无法下载，可以把yml文件下载下来。

这个时候可以看见flannel的pod没有运行起来，话需要做一个关键的配置。

Flannel Pod崩溃的原因是缺少 br_netfilter 模块或相关设置未启用。这个模块对于 Flannel 网络的正确工作是必需的。以下是一些可能的解决方案：

加载 br_netfilter 模块 :

运行以下命令来加载 br_netfilter 模块：
bash 复制代码
```
sudo modprobe br_netfilter
```
确保模块在启动时加载 :

编辑 /etc/modules-load.d/k8s.conf，如果文件不存在，请创建它，然后添加下面的行：
plaintext 复制代码
```
br_netfilter
```
设置 sysctl 参数 :

运行以下命令以确保相关的 sysctl 参数正确设置：
bash 复制代码
```
sudo sysctl net.bridge.bridge-nf-call-iptables=1
sudo sysctl net.bridge.bridge-nf-call-ip6tables=1
```
为了确保这些设置在重启后保留，可以将它们添加到 /etc/sysctl.conf 文件中：
plaintext 复制代码
```
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
```
重启相关服务 :

如果上述步骤没有解决问题，您可以尝试重启Kubernetes节点或者重新启动Flannel相关的Pod以应用新的设置：
bash 复制代码
```
kubectl delete pod -n kube-flannel --all
```

完成这些步骤后，再次检查 kube-flannel Pods 状态并确认是否恢复正常。确保重新检查日志以验证问题是否解决。

如果出现问题，可以查看各个组件的日志，包括：

docker,cri-dockerd kubelet,flannel的Pod日志。有时候是镜像拉取不到，k8s会自己删除存在的一段时间没有使用的镜像，会重新拉取，可能拉取最新的镜像，所以如果不行就手动拉取之后通过打包docker save和docker load来给集群节点提供镜像。

这是本地部署的可视化界面kuboard