Kubernetes 集群calico网络故障排查思路

报错calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30

复制代码
\\calico未准备好,BGP协议不能与172.16.0.20,172.16.0.30内网IP地址连接
BGP协议:边界网关协议

访问k8s的dashboard界面无法访问网站,查看pod,未知原因导致calico的Pod资源重新创建后无法启动,显示的是0/1状态

复制代码
[root@k8s-master yaml]# kubectl get pod -n kube-system
NAMESPACE              NAME                                        READY   STATUS    RESTARTS   AGE
...
kube-system            calico-kube-controllers-578894d4cd-rsgqd    1/1     Running   0          115d
kube-system            calico-node-64s8s                           1/1     Running   3          127d
kube-system            calico-node-j4t7q                           1/1     Running   0          127d
kube-system            calico-node-n6vr4                           0/1     Running   0          40s

Calico的Pod报错内容

复制代码
[root@k8s-master yaml]# kubectl describe pod -n kube-system calico-node-n6vr4
Events:
  Type     Reason     Age        From                 Message
  ----     ------     ----       ----                 -------
  Normal   Scheduled  <unknown>  default-scheduler    Successfully assigned kube-system/calico-node-n6vr4 to k8s-master
  Normal   Pulled     41s        kubelet, k8s-master  Container image "calico/cni:v3.15.1" already present on machine
  Normal   Created    41s        kubelet, k8s-master  Created container upgrade-ipam
  Normal   Started    40s        kubelet, k8s-master  Started container upgrade-ipam
  Normal   Pulled     40s        kubelet, k8s-master  Container image "calico/cni:v3.15.1" already present on machine
  Normal   Started    39s        kubelet, k8s-master  Started container install-cni
  Normal   Created    39s        kubelet, k8s-master  Created container install-cni
  Normal   Pulled     39s        kubelet, k8s-master  Container image "calico/pod2daemon-flexvol:v3.15.1" already present on machine
  Normal   Pulled     38s        kubelet, k8s-master  Container image "calico/node:v3.15.1" already present on machine
  Normal   Started    38s        kubelet, k8s-master  Started container flexvol-driver
  Normal   Created    38s        kubelet, k8s-master  Created container flexvol-driver
  Normal   Created    37s        kubelet, k8s-master  Created container calico-node
  Normal   Started    37s        kubelet, k8s-master  Started container calico-node
  Warning  Unhealthy  27s        kubelet, k8s-master  Readiness probe failed: 2020-08-14 02:16:54.068 [INFO][142] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
  Warning  Unhealthy  17s  kubelet, k8s-master  Readiness probe failed: 2020-08-14 02:17:04.059 [INFO][181] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30
  Warning  Unhealthy  7s  kubelet, k8s-master  Readiness probe failed: 2020-08-14 02:17:14.065 [INFO][207] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.16.0.20,172.16.0.30

原因:calico没有发现实node节点实际的网卡名称

解决方法

调整calicao的网络插件的网卡发现机制,修改IP_AUTODETECTION_METHOD对应的value值。下载的官方提供的yaml文件中,ip识别策略(IPDETECTMETHOD)没有配置,即默认为first-found,这会导致一个网络异常的ip作为nodeIP被注册,从而影响node之间的网络连接。可以修改成can-reach或者interface的策略,尝试连接某一个Ready的node的IP,以此选择出正确的IP。

复制代码
# 修改calicao的yaml文件,添加两行配置# - name: IP_AUTODETECTION_METHOD# value: "interface=eth1"  # 根据实际网卡名称配置           [root@k8s-master yaml]# vim calico.yaml...(3546行)            # Cluster type to identify the deployment type            - name: CLUSTER_TYPE              value: "k8s,bgp"            #新添加的配置            - name: IP_AUTODETECTION_METHOD              value: "interface=eth1"            # Auto-detect the BGP IP address.            - name: IP              value: "autodetect"            # Enable IPIP            - name: CALICO_IPV4POOL_IPIP              value: "Always"            # Enable or Disable VXLAN on the default IP pool.            - name: CALICO_IPV4POOL_VXLAN              value: "Never"
#重新构建kubectl apply -f calico.yaml

修复完成

复制代码
[root@k8s-master yaml]# kubectl get pod -n kube-system 
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-578894d4cd-rsgqd   1/1     Running   0          115d
calico-node-6ktn4                          1/1     Running   0          26m
calico-node-8k5z8                          1/1     Running   0          26m
calico-node-g87hc                          1/1     Running   0          1m

再次访问集群的各种资源已经可以访问了

相关推荐
开发者联盟league7 小时前
使用k8s安装Sonarqube
云原生·容器·kubernetes
松岩11 小时前
网络问题导致 Pod Pending
kubernetes·aiops
运维老郭15 小时前
Kubernetes 二进制部署完全指南:从零搭建生产级HA集群
运维·云原生·kubernetes
成为你的宁宁16 小时前
【K8S黑盒监控实践:Probe配置、Prometheus验证与Grafana可视化】
kubernetes·grafana·prometheus
成为你的宁宁16 小时前
【Prometheus Operator监控K8S Nginx】
nginx·kubernetes·prometheus
宇明一不急17 小时前
k8s headless svc
云原生·容器·kubernetes
成为你的宁宁18 小时前
【K8S使用Helm部署MySQL一主多从并集成Prometheus监控】
mysql·kubernetes·prometheus
openFuyao19 小时前
openFuyao使能灵衢超节点::让容器业务丝滑释放节点能力
容器·kubernetes·ai原生·openfuyao·多样化算力·超节点·集群软件
无聊的老谢19 小时前
Spring Cloud Alibaba 应用的容器化部署与 K8s 编排
云原生·容器·kubernetes
liux352819 小时前
Namespace 多租户隔离:K8s 资源管理的基石
docker·容器·kubernetes