rancher使用rke在华为云多网卡的服务器上安装k8s集群问题处理

报错:

问题:

\[network\] Host \[192.168.0.213\] is not able to connect to the following ports: \[192.168.0.213:2379\]. Please check network policies and firewall rules

复制代码
问题:
root@hwy-isms-210-66:~# gotelnet 172.17.210.66 2379
map[2379:failed]
root@hwy-isms-210-66:~# gotelnet 127.0.0.1 2379
map[2379:success]
root@hwy-isms-210-66:~# docker ps
CONTAINER ID   IMAGE                          COMMAND                  CREATED       STATUS       PORTS                                                    NAMES
b6f75ff566d5   rancher/rke-tools:v0.1.96      "/docker-entrypoint...."   6 hours ago   Up 6 hours   80/tcp, 0.0.0.0:10250->1337/tcp                          rke-worker-port-listener
ac3e20c949df   rancher/rke-tools:v0.1.96      "/docker-entrypoint...."   6 hours ago   Up 6 hours   80/tcp, 0.0.0.0:6443->1337/tcp                           rke-cp-port-listener
e106814143a3   rancher/rke-tools:v0.1.96      "/docker-entrypoint...."   6 hours ago   Up 6 hours   80/tcp, 0.0.0.0:2379->1337/tcp, 0.0.0.0:2380->1337/tcp   rke-etcd-port-listener
6a866546f8bb   rancher/rancher-agent:v2.8.5   "run.sh --server htt..."   6 hours ago   Up 6 hours                                                            peaceful_albattani
9bbffd35d9a4   rancher/rancher-agent:v2.8.5   "run.sh --server htt..."   6 hours ago   Up 6 hours                                                            confident_fermi
root@hwy-isms-210-66:~# ifconfig 
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.18.0.1  netmask 255.255.0.0  broadcast 172.18.255.255
        ether a6:c3:99:d0:cf:03  txqueuelen 0  (Ethernet)
        RX packets 3547  bytes 100789 (98.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 86  bytes 5196 (5.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.210.66  netmask 255.255.255.0  broadcast 172.17.210.255
        ether fa:16:3e:40:01:71  txqueuelen 1000  (Ethernet)
        RX packets 122941811  bytes 23935288095 (22.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 127262310  bytes 14351697946 (13.3 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.210.67  netmask 255.255.255.0  broadcast 172.17.210.255
        ether fa:16:3e:40:01:72  txqueuelen 1000  (Ethernet)
        RX packets 207177  bytes 17420004 (16.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 202098  bytes 20182560 (19.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.210.68  netmask 255.255.255.0  broadcast 172.17.210.255
        ether fa:16:3e:40:01:73  txqueuelen 1000  (Ethernet)
        RX packets 180108  bytes 15241156 (14.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 248119  bytes 22751922 (21.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 1352589  bytes 102392483 (97.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1352589  bytes 102392483 (97.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth13ea56c: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 7a:fc:db:8f:3c:0f  txqueuelen 0  (Ethernet)
        RX packets 59  bytes 3636 (3.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 73  bytes 4338 (4.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth6b767de: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 7e:17:74:fd:a7:27  txqueuelen 0  (Ethernet)
        RX packets 3  bytes 126 (126.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6  bytes 412 (412.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vethf9165ed: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether f6:46:67:c2:93:2e  txqueuelen 0  (Ethernet)
        RX packets 3  bytes 126 (126.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 9  bytes 538 (538.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@hwy-isms-210-66:~# cat /etc/rc.local 
#!/bin/sh -e
# rc.local
# 开机执行的路由配置命令
ip route add default via 172.17.210.1 dev eth0 table 10
ip route add 172.17.210.0/24 dev eth0 table 10
ip rule add from 172.17.210.66 table 10
ip route add default via 172.17.210.1 dev eth1 table 20
ip route add 172.17.210.0/24 dev eth1 table 20
ip rule add from 172.17.210.67 table 20
ip route add default via 172.17.210.1 dev eth2 table 30
ip route add 172.17.210.0/24 dev eth2 table 30
ip rule add from 172.17.210.68 table 30
exit 0
root@hwy-isms-210-66:~# 
为啥127.0.0.1 2379可以通,172.17.210.66 2379不通,且从同网段的服务器上是可以telnet通172.17.210.66 2379端口的。

该问题的根本原因在于网络路由策略限制,具体分析如下:

/etc/rc.local中配置了多网卡策略路由,强制不同源IP走不同路由表

从172.17.210.66发出的流量被ip rule add from 172.17.210.66 table 10强制路由

可能因路由表10缺少到docker0网桥(172.18.0.1/16)的路由导致不通

复制代码
root@hwy-isms-210-66:~#                                        ip route list
default via 172.17.210.1 dev eth0 proto dhcp metric 100 
169.254.169.254 via 172.17.210.1 dev eth0 proto dhcp metric 100 
172.17.210.0/24 dev eth0 proto kernel scope link src 172.17.210.66 metric 100 
172.17.210.0/24 dev eth1 proto kernel scope link src 172.17.210.67 metric 101 
172.17.210.0/24 dev eth2 proto kernel scope link src 172.17.210.68 metric 102 
172.18.0.0/16 dev docker0 proto kernel scope link src 172.18.0.1 
root@hwy-isms-210-66:~# ip rule list
0:	from all lookup local
32763:	from 172.17.210.68 lookup 30
32764:	from 172.17.210.67 lookup 20
32765:	from 172.17.210.66 lookup 10
32766:	from all lookup main
32767:	from all lookup default
root@hwy-isms-210-66:~# 
root@hwy-isms-210-66:~# ip route show table 10
default via 172.17.210.1 dev eth0 
172.17.210.0/24 dev eth0 scope link 
root@hwy-isms-210-66:~# 

解决方案‌:

方案1:在路由表10中添加docker0网段路由

复制代码
ip route add 172.18.0.0/16 dev docker0 table 10

要让 Calico 将其维护的路由规则 **只** 写入到 `table 10` 而不是 `main` 表,你需要调整 Calico 的配置。以下是具体方法:


**1. 修改 Calico 的 `FelixConfiguration`**

Calico 默认会将部分路由写入 `main` 表,但可以通过 `RouteTableRange` 配置来限制路由表范围。

**操作步骤:**

  1. **检查当前的 `FelixConfiguration`:**

```bash

kubectl get felixconfiguration default -o yaml

```

  1. **修改 `RouteTableRange` 配置:**

```bash

kubectl patch felixconfiguration default \

--type='merge' \

-p '{"spec":{"routeTableRange":{"min":10,"max":10}}}'

```

  • 这样设置后,Calico 只会使用 `table 10`,而不会写入 `main` 表。
  1. **验证是否生效:**
  • 等待 Calico 重新应用配置(可能需要重启 `calico-node` Pod)。

  • 检查路由表:

```bash

ip route show table 10 # 应该包含所有 Calico 路由

ip route show table main # 应该不再有 Calico 维护的路由

```


**2. 确保 `ip rule` 正确指向 `table 10`**

Calico 默认会添加 `ip rule` 规则,确保 Pod 流量正确查询 `table 10`。检查:

```bash

ip rule list

```

预期输出类似:

```

0: from all lookup local

32765: from 10.42.213.64/26 lookup 10 # Calico 的规则

32766: from all lookup main

```

  • 如果缺少 `table 10` 的规则,可能需要调整 Calico 的配置或手动添加。

**3. 调整 BIRD 配置(如果使用 BGP)**

如果 Calico 使用 BGP 协议同步路由,可能需要调整 BIRD 的配置,确保它只在 `table 10` 中操作。

**修改 `calico-node` 的 `configmap`:**

```bash

kubectl -n kube-system edit configmap calico-config

```

在 `bird_template` 部分,确保类似如下配置:

```conf

protocol kernel {

learn;

scan time 10;

import all;

export all;

kernel table 10; # 明确指定使用 table 10

}

```


**4. 重启 `calico-node` Pod 使配置生效**

```bash

kubectl -n kube-system rollout restart daemonset calico-node

```


**5. 清理 `main` 表中残留的 Calico 路由**

如果 `main` 表中仍有残留的 Calico 路由,可以手动删除(谨慎操作):

```bash

ip route del blackhole 10.42.213.64/26 proto bird

ip route del 10.42.213.65 dev calid6b141b5a7c scope link

删除其他不需要的 Calico 路由...

```


**验证**

  • 检查 `table 10` 是否包含所有 Calico 路由:

```bash

ip route show table 10

```

  • 检查 `main` 表是否不再有 Calico 路由:

```bash

ip route show table main

```

  • 确保 Pod 网络仍然正常通信。

**注意事项**

  • **网络中断风险**:修改路由表可能导致短暂的网络中断,建议在维护窗口操作。

  • **CNI 兼容性**:某些 CNI 插件或网络策略可能依赖 `main` 表,需测试兼容性。

  • **备份路由表**:操作前建议备份当前路由表:

```bash

ip route save > ip_route_backup.txt

ip rule save > ip_rule_backup.txt

```

如果仍有问题,可以提供 `ip rule list` 和最新的路由表信息进一步排查。

相关推荐
czhc114007566320 分钟前
LINUX 820 shell:shift,expect
linux·运维·excel
元清加油26 分钟前
【Goland】:协程和通道
服务器·开发语言·后端·网络协议·golang
咕噜签名分发冰淇淋35 分钟前
签名应用APP分发平台的微服务化部署是什么?其有哪些优势?
java·运维·微服务
望获linux1 小时前
【实时Linux实战系列】基于实时Linux的物联网系统设计
linux·运维·服务器·chrome·php
进击的阿尔法猿1 小时前
docker
运维·docker·容器
北京阿法龙科技有限公司2 小时前
AR 虚实叠加技术在工业设备运维中的实现流程方案
运维·ar
刘一说2 小时前
CentOS 系统 Java 开发测试环境搭建手册
java·linux·运维·服务器·centos
腾科张老师2 小时前
OSPF 典型组网
网络·智能路由器
2301_801673017 小时前
8.19笔记
网络·安全
木易双人青9 小时前
01-Docker-简介、安装与使用
运维·docker·容器