Kubernetes 集群calico网络故障排查思路

报错calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30

\\calico未准备好,BGP协议不能与172.16.0.20,172.16.0.30内网IP地址连接
BGP协议:边界网关协议

访问k8s的dashboard界面无法访问网站,查看pod,未知原因导致calico的Pod资源重新创建后无法启动,显示的是0/1状态

[root@k8s-master yaml]# kubectl get pod -n kube-system
NAMESPACE              NAME                                        READY   STATUS    RESTARTS   AGE
...
kube-system            calico-kube-controllers-578894d4cd-rsgqd    1/1     Running   0          115d
kube-system            calico-node-64s8s                           1/1     Running   3          127d
kube-system            calico-node-j4t7q                           1/1     Running   0          127d
kube-system            calico-node-n6vr4                           0/1     Running   0          40s

Calico的Pod报错内容

[root@k8s-master yaml]# kubectl describe pod -n kube-system calico-node-n6vr4
Events:
  Type     Reason     Age        From                 Message
  ----     ------     ----       ----                 -------
  Normal   Scheduled  <unknown>  default-scheduler    Successfully assigned kube-system/calico-node-n6vr4 to k8s-master
  Normal   Pulled     41s        kubelet, k8s-master  Container image "calico/cni:v3.15.1" already present on machine
  Normal   Created    41s        kubelet, k8s-master  Created container upgrade-ipam
  Normal   Started    40s        kubelet, k8s-master  Started container upgrade-ipam
  Normal   Pulled     40s        kubelet, k8s-master  Container image "calico/cni:v3.15.1" already present on machine
  Normal   Started    39s        kubelet, k8s-master  Started container install-cni
  Normal   Created    39s        kubelet, k8s-master  Created container install-cni
  Normal   Pulled     39s        kubelet, k8s-master  Container image "calico/pod2daemon-flexvol:v3.15.1" already present on machine
  Normal   Pulled     38s        kubelet, k8s-master  Container image "calico/node:v3.15.1" already present on machine
  Normal   Started    38s        kubelet, k8s-master  Started container flexvol-driver
  Normal   Created    38s        kubelet, k8s-master  Created container flexvol-driver
  Normal   Created    37s        kubelet, k8s-master  Created container calico-node
  Normal   Started    37s        kubelet, k8s-master  Started container calico-node
  Warning  Unhealthy  27s        kubelet, k8s-master  Readiness probe failed: 2020-08-14 02:16:54.068 [INFO][142] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
  Warning  Unhealthy  17s  kubelet, k8s-master  Readiness probe failed: 2020-08-14 02:17:04.059 [INFO][181] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
  Warning  Unhealthy  7s  kubelet, k8s-master  Readiness probe failed: 2020-08-14 02:17:14.065 [INFO][207] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30

原因:calico没有发现实node节点实际的网卡名称

解决方法

调整calicao的网络插件的网卡发现机制,修改IP_AUTODETECTION_METHOD对应的value值。下载的官方提供的yaml文件中,ip识别策略(IPDETECTMETHOD)没有配置,即默认为first-found,这会导致一个网络异常的ip作为nodeIP被注册,从而影响node之间的网络连接。可以修改成can-reach或者interface的策略,尝试连接某一个Ready的node的IP,以此选择出正确的IP。

# 修改calicao的yaml文件,添加两行配置# - name: IP_AUTODETECTION_METHOD# value: "interface=eth1"  # 根据实际网卡名称配置           [root@k8s-master yaml]# vim calico.yaml...(3546行)            # Cluster type to identify the deployment type            - name: CLUSTER_TYPE              value: "k8s,bgp"            #新添加的配置            - name: IP_AUTODETECTION_METHOD              value: "interface=eth1"            # Auto-detect the BGP IP address.            - name: IP              value: "autodetect"            # Enable IPIP            - name: CALICO_IPV4POOL_IPIP              value: "Always"            # Enable or Disable VXLAN on the default IP pool.            - name: CALICO_IPV4POOL_VXLAN              value: "Never"
#重新构建kubectl apply -f calico.yaml

修复完成

[root@k8s-master yaml]# kubectl get pod -n kube-system 
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-578894d4cd-rsgqd   1/1     Running   0          115d
calico-node-6ktn4                          1/1     Running   0          26m
calico-node-8k5z8                          1/1     Running   0          26m
calico-node-g87hc                          1/1     Running   0          1m

再次访问集群的各种资源已经可以访问了

相关推荐
景天科技苑10 小时前
【云原生开发】K8S多集群资源管理平台架构设计
云原生·容器·kubernetes·k8s·云原生开发·k8s管理系统
wclass-zhengge11 小时前
K8S篇(基本介绍)
云原生·容器·kubernetes
颜淡慕潇11 小时前
【K8S问题系列 |1 】Kubernetes 中 NodePort 类型的 Service 无法访问【已解决】
后端·云原生·容器·kubernetes·问题解决
昌sit!19 小时前
K8S node节点没有相应的pod镜像运行故障处理办法
云原生·容器·kubernetes
A ?Charis1 天前
Gitlab-runner running on Kubernetes - hostAliases
容器·kubernetes·gitlab
北漂IT民工_程序员_ZG1 天前
k8s集群安装(minikube)
云原生·容器·kubernetes
2301_806131361 天前
Kubernetes的基本构建块和最小可调度单元pod-0
云原生·容器·kubernetes
SilentCodeY1 天前
containerd配置私有仓库registry
容器·kubernetes·containerd·镜像·crictl
binqian1 天前
【k8s】ClusterIP能http访问,但是不能ping 的原因
http·容器·kubernetes
探索云原生2 天前
GPU 环境搭建指南:如何在裸机、Docker、K8s 等环境中使用 GPU
ai·云原生·kubernetes·go·gpu