利用Kubespray安装生产环境的k8s集群 - 排错
Ansible 排错
- Ansible 安装后需要加入到PATH,以便能够直接运行。 一般用非root用户运行,注意在playbook或者inventory里要显性定义become属性。 最好采用ansible运行账号可以免密码sudo的模式。
- Ansible 的配置文件,在kuberspray安装时已经存在,需要将Ansible运行配置文件指向相应位置。配置文件约定了Ansible运行时需要的各类依赖的位置,非常重要。
可通过运行 ansible --version 查看ansible的各类配置情况
以下代码可以看到ansible尚未绑定任何配置文件。
bash
bill@jump-server:~/kubespray/playbooks$ ansible --version
ansible [core 2.16.14]
config file = None # 未绑定默认配置文件
configured module search path = ['/home/bill/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/bill/.local/lib/python3.11/site-packages/ansible
ansible collection location = /home/bill/.ansible/collections:/usr/share/ansible/collections
executable location = /home/bill/.local/bin/ansible
python version = 3.11.2 (main, Nov 30 2024, 21:22:50) [GCC 12.2.0] (/usr/bin/python3)
jinja version = 3.1.2
libyaml = True
两种(二选一 Alternative)临时定义实现配置文件绑定:
bash
export ANSIBLE_CONFIG=/home/bill/kubespray/ansible.cf # 方法1 定义系统变量
ansible-playbook -i ../inventory/mycluster/inventory.ini -c /home/bill/kubespray/ansible.cf cluster.yml # 方法2, 每次运行ansible时 用 -c 去显性指定
- Ansbile 是可以反复执行同一个playbook,如果上一次执行时有任务失败有任务成功,ansible会从新执行失败的任务,并掠过成功的任务,然后记录每次执行成功和失败 已经未变动的TASK情况。
Kubernets 排错
用一条ansible 命令去执行cluster.yml 这一个playbook,即完成了整个kubernets集群的安装。然后你就可以ssh到各个node 去验收了。
我自己装完后几乎没发现大的问题。 小问题倒是有几个。
- admin.conf 配置文件没有和kubectl绑定,导致第一次运行kubectl报错。只有定义好kubectl执行的config文件即可。
- 3台master node上,都是在127.0.0.1:6443监听和暴露Kubernets API server的,这样显然无法提供api server的高可用。
我也不知道给3 master node的k8s做api server 高可用的最佳实践是什么,先挖个坑。
附录 - 安装完成后的Kubernets Cluster 情况
1 . 大致的资源
bash
bill@master-1:~$ sudo kubectl get ns -A
NAME STATUS AGE
default Active 2d6h
kube-node-lease Active 2d6h
kube-public Active 2d6h
kube-system Active 2d6h
bill@master-1:~$ sudo kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-69d8557557-txxwk 1/1 Running 0 2d6h
kube-system calico-node-2wj9h 1/1 Running 0 2d6h
kube-system calico-node-4ntch 1/1 Running 0 2d6h
kube-system calico-node-66pd7 1/1 Running 0 2d6h
kube-system calico-node-vs699 1/1 Running 0 2d6h
kube-system calico-node-x29jb 1/1 Running 0 2d6h
kube-system coredns-5c54f84c97-5src4 1/1 Running 0 2d6h
kube-system coredns-5c54f84c97-lbl9g 1/1 Running 0 2d6h
kube-system dns-autoscaler-76ddddbbc-9c56w 1/1 Running 0 2d6h
kube-system kube-apiserver-master-1 1/1 Running 1 2d6h
kube-system kube-apiserver-master-2 1/1 Running 1 2d6h
kube-system kube-apiserver-master-3 1/1 Running 1 2d6h
kube-system kube-controller-manager-master-1 1/1 Running 3 (11h ago) 2d6h
kube-system kube-controller-manager-master-2 1/1 Running 2 2d6h
kube-system kube-controller-manager-master-3 1/1 Running 2 2d6h
kube-system kube-proxy-52rsj 1/1 Running 0 2d6h
kube-system kube-proxy-c5z9k 1/1 Running 0 2d6h
kube-system kube-proxy-hctbg 1/1 Running 0 2d6h
kube-system kube-proxy-hkth5 1/1 Running 0 2d6h
kube-system kube-proxy-rbpp2 1/1 Running 0 2d6h
kube-system kube-scheduler-master-1 1/1 Running 1 2d6h
kube-system kube-scheduler-master-2 1/1 Running 2 2d6h
kube-system kube-scheduler-master-3 1/1 Running 1 2d6h
kube-system nginx-proxy-worker-1 1/1 Running 0 2d6h
kube-system nginx-proxy-worker-2 1/1 Running 0 2d6h
kube-system nodelocaldns-7ftp2 1/1 Running 0 2d6h
kube-system nodelocaldns-ckxd4 1/1 Running 0 2d6h
kube-system nodelocaldns-kp64g 1/1 Running 0 2d6h
kube-system nodelocaldns-x6bhp 1/1 Running 0 2d6h
kube-system nodelocaldns-z5grf 1/1 Running 0 2d6h
bill@master-1:~$ sudo kubectl get nodes -A
NAME STATUS ROLES AGE VERSION
master-1 Ready control-plane 2d6h v1.32.0
master-2 Ready control-plane 2d6h v1.32.0
master-3 Ready control-plane 2d6h v1.32.0
worker-1 Ready <none> 2d6h v1.32.0
worker-2 Ready <none> 2d6h v1.32.0
bill@master-1:~$ sudo kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 2d6h
kube-system coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 2d6h
bill@master-1:~$ sudo kubectl get ds -A
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system calico-node 5 5 5 5 5 kubernetes.io/os=linux 2d6h
kube-system kube-proxy 5 5 5 5 5 kubernetes.io/os=linux 2d6h
kube-system nodelocaldns 5 5 5 5 5 kubernetes.io/os=linux 2d6h
确认没问题了, 你就可以用Lens 来管理和监控新的K8s 集群了。
- 以 systemd方式部署的二进制etcd
bash
bill@master-1:~$ etcdctl version
etcdctl version: 3.5.16
API version: 3.5
bill@master-1:~$ sudo systemctl status etcd
● etcd.service - etcd
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; preset: enabled)
Active: active (running) since Mon 2025-01-20 12:21:29 EST; 2 days ago
Main PID: 19218 (etcd)
Tasks: 13 (limit: 4602)
Memory: 202.9M
CPU: 48min 35.517s
CGroup: /system.slice/etcd.service
└─19218 /usr/local/bin/etcd
Jan 22 18:54:59 master-1 etcd[19218]: {"level":"info","ts":"2025-01-22T18:54:59.823494-0500","caller":"mvcc/hash.go:151","msg":"storing new hash","hash":1770769020,"revision":116007,"compa>
Jan 22 18:59:59 master-1 etcd[19218]: {"level":"info","ts":"2025-01-22T18:59:59.827207-0500","caller":"mvcc/index.go:214","msg":"compact tree index","revision":116632}
- CRI 采用了containerd
bash
bill@master-1:~$ sudo systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/etc/systemd/system/containerd.service; enabled; preset: enabled)
Active: active (running) since Mon 2025-01-20 12:14:33 EST; 2 days ago
Docs: https://containerd.io
Main PID: 16548 (containerd)
Tasks: 108
Memory: 995.8M
CPU: 12min 17.084s
CGroup: /system.slice/containerd.service
├─16548 /usr/local/bin/containerd
├─20832 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id b8999694f0d531b9cff8a7bb5797d7e091988801296d72ba615d10851eee175d -address /run/containerd/containerd.sock
├─21418 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id 06de62add7a4030a3af3a4f2978d0e8d88ef8af100f682b9f350f3b2bf1fbdf7 -address
以上 Kubernets Cluster 通过Kuberspray进行全自动化定制安装就搞定了。 下一篇讲讲哪些常用的重要的配置是需要设置的,已经如何设置。