Kubernetes 集群calico网络故障排查思路

报错calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30

复制代码
\\calico未准备好,BGP协议不能与172.16.0.20,172.16.0.30内网IP地址连接
BGP协议:边界网关协议

访问k8s的dashboard界面无法访问网站,查看pod,未知原因导致calico的Pod资源重新创建后无法启动,显示的是0/1状态

复制代码
[root@k8s-master yaml]# kubectl get pod -n kube-system
NAMESPACE              NAME                                        READY   STATUS    RESTARTS   AGE
...
kube-system            calico-kube-controllers-578894d4cd-rsgqd    1/1     Running   0          115d
kube-system            calico-node-64s8s                           1/1     Running   3          127d
kube-system            calico-node-j4t7q                           1/1     Running   0          127d
kube-system            calico-node-n6vr4                           0/1     Running   0          40s

Calico的Pod报错内容

复制代码
[root@k8s-master yaml]# kubectl describe pod -n kube-system calico-node-n6vr4
Events:
  Type     Reason     Age        From                 Message
  ----     ------     ----       ----                 -------
  Normal   Scheduled  <unknown>  default-scheduler    Successfully assigned kube-system/calico-node-n6vr4 to k8s-master
  Normal   Pulled     41s        kubelet, k8s-master  Container image "calico/cni:v3.15.1" already present on machine
  Normal   Created    41s        kubelet, k8s-master  Created container upgrade-ipam
  Normal   Started    40s        kubelet, k8s-master  Started container upgrade-ipam
  Normal   Pulled     40s        kubelet, k8s-master  Container image "calico/cni:v3.15.1" already present on machine
  Normal   Started    39s        kubelet, k8s-master  Started container install-cni
  Normal   Created    39s        kubelet, k8s-master  Created container install-cni
  Normal   Pulled     39s        kubelet, k8s-master  Container image "calico/pod2daemon-flexvol:v3.15.1" already present on machine
  Normal   Pulled     38s        kubelet, k8s-master  Container image "calico/node:v3.15.1" already present on machine
  Normal   Started    38s        kubelet, k8s-master  Started container flexvol-driver
  Normal   Created    38s        kubelet, k8s-master  Created container flexvol-driver
  Normal   Created    37s        kubelet, k8s-master  Created container calico-node
  Normal   Started    37s        kubelet, k8s-master  Started container calico-node
  Warning  Unhealthy  27s        kubelet, k8s-master  Readiness probe failed: 2020-08-14 02:16:54.068 [INFO][142] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
  Warning  Unhealthy  17s  kubelet, k8s-master  Readiness probe failed: 2020-08-14 02:17:04.059 [INFO][181] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
  Warning  Unhealthy  7s  kubelet, k8s-master  Readiness probe failed: 2020-08-14 02:17:14.065 [INFO][207] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30

原因:calico没有发现实node节点实际的网卡名称

解决方法

调整calicao的网络插件的网卡发现机制,修改IP_AUTODETECTION_METHOD对应的value值。下载的官方提供的yaml文件中,ip识别策略(IPDETECTMETHOD)没有配置,即默认为first-found,这会导致一个网络异常的ip作为nodeIP被注册,从而影响node之间的网络连接。可以修改成can-reach或者interface的策略,尝试连接某一个Ready的node的IP,以此选择出正确的IP。

复制代码
# 修改calicao的yaml文件,添加两行配置# - name: IP_AUTODETECTION_METHOD# value: "interface=eth1"  # 根据实际网卡名称配置           [root@k8s-master yaml]# vim calico.yaml...(3546行)            # Cluster type to identify the deployment type            - name: CLUSTER_TYPE              value: "k8s,bgp"            #新添加的配置            - name: IP_AUTODETECTION_METHOD              value: "interface=eth1"            # Auto-detect the BGP IP address.            - name: IP              value: "autodetect"            # Enable IPIP            - name: CALICO_IPV4POOL_IPIP              value: "Always"            # Enable or Disable VXLAN on the default IP pool.            - name: CALICO_IPV4POOL_VXLAN              value: "Never"
#重新构建kubectl apply -f calico.yaml

修复完成

复制代码
[root@k8s-master yaml]# kubectl get pod -n kube-system 
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-578894d4cd-rsgqd   1/1     Running   0          115d
calico-node-6ktn4                          1/1     Running   0          26m
calico-node-8k5z8                          1/1     Running   0          26m
calico-node-g87hc                          1/1     Running   0          1m

再次访问集群的各种资源已经可以访问了

相关推荐
Johny_Zhao2 小时前
Docker + CentOS 部署 Zookeeper 集群 + Kubernetes Operator 自动化运维方案
linux·网络安全·docker·信息安全·zookeeper·kubernetes·云计算·系统运维
木鱼时刻1 天前
容器与 Kubernetes 基本概念与架构
容器·架构·kubernetes
chuanauc2 天前
Kubernets K8s 学习
java·学习·kubernetes
庸子2 天前
基于Jenkins和Kubernetes构建DevOps自动化运维管理平台
运维·kubernetes·jenkins
李白你好2 天前
高级运维!Kubernetes(K8S)常用命令的整理集合
运维·容器·kubernetes
Connie14512 天前
k8s多集群管理中的联邦和舰队如何理解?
云原生·容器·kubernetes
伤不起bb3 天前
Kubernetes 服务发布基础
云原生·容器·kubernetes
别骂我h3 天前
Kubernetes服务发布基础
云原生·容器·kubernetes
weixin_399380693 天前
k8s一键部署tongweb企业版7049m6(by why+lqw)
java·linux·运维·服务器·云原生·容器·kubernetes
斯普信专业组4 天前
K8s环境下基于Nginx WebDAV与TLS/SSL的文件上传下载部署指南
nginx·kubernetes·ssl