一、场景分析
1.pod与节点关系
-
同一节点 Pod ↔ Pod(PodIP 直连)的真实数据路径
-
不同节点 Pod ↔ Pod(PodIP 直连)的真实数据路径(本环境为 Calico IPIP)
2.环境摘要
-
CNI:Calico
-
封装 :IPIP(能在节点上看到
tunl0,并且物理网卡可抓到proto 4) -
命名空间 :
net-debug
3.当前 Pod 分布
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hostnet-master01 1/1 Running 0 102m 192.168.48.80 master01 <none> <none>
hostnet-node01 1/1 Running 0 43m 192.168.48.81 node01 <none> <none>
hostnet-node02 1/1 Running 0 6m35s 192.168.48.82 node02 <none> <none>
net-a 1/1 Running 0 102m 100.117.144.166 node01 <none> <none>
net-b 1/1 Running 0 102m 100.117.144.161 node01 <none> <none>
network-tools-ls6gl 1/1 Running 0 108m 100.85.170.174 master01 <none> <none>
network-tools-pzq2w 1/1 Running 0 108m 100.95.185.247 node02 <none> <none>
network-tools-z4dxk 1/1 Running 0 108m 100.117.144.163 node01 <none> <none>
以及本次演示用到的关键 IP(示例输出):
net-a IP=100.117.144.166
net-b IP=100.117.144.161
src pod=network-tools-ls6gl ip=100.85.170.174
dst pod=network-tools-pzq2w ip=100.95.185.247
二、过程演示及结论
同一节点:net-a → net-b(都在 node01)
1)Pod 内路由(net-a)
命令:
kubectl -n net-debug exec net-a -- bash -lc 'ip -br addr; echo; ip route; echo; ip route get <net-b-podIP>'
输出(示例):
lo UNKNOWN 127.0.0.1/8 ::1/128
tunl0@NONE DOWN
eth0@if18 UP 100.117.144.166/32 fe80::a0c7:1bff:fe6a:631/64
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link
100.117.144.161 via 169.254.1.1 dev eth0 src 100.117.144.166 uid 0
cache
关键点:
-
Pod 的
eth0是 /32 (100.117.144.166/32) -
默认路由指向
169.254.1.1(Calico 在 Pod 内注入的 link-local "网关")
2)找到宿主机侧 cali* veth 端点(node01)
命令(在 Pod 内拿到 eth0@ifX):
kubectl -n net-debug exec net-a -- ip -o link show eth0
kubectl -n net-debug exec net-b -- ip -o link show eth0
命令(在 node01 的 hostNetwork 调试 Pod 内反查 ifindex→接口名):
kubectl -n net-debug exec hostnet-node01 -- bash -lc 'ip -o link show | grep -E \"^[0-9]+: cali\"'
输出(示例):
net-a eth0@if18
net-b eth0@if17
---
18: cali3c6889ba186@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP mode DEFAULT group default qlen 1000\ link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 4
17: cali8712894f3f8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP mode DEFAULT group default qlen 1000\ link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 3
3)抓包 + ping 证明同节点是本机转发
命令:
# 目标 Pod 抓包(net-b)
kubectl -n net-debug exec net-b -- bash -lc 'timeout 8 tcpdump -ni eth0 icmp -vv'
# node01 宿主机抓包(hostNetwork Pod)
kubectl -n net-debug exec hostnet-node01 -- bash -lc 'timeout 8 tcpdump -ni any icmp -vv'
# 发起 ping(net-a -> net-b)
kubectl -n net-debug exec net-a -- bash -lc 'ping -c 3 -W 1 <net-b-podIP>'
输出(示例:ping):
PING 100.117.144.161 (100.117.144.161) 56(84) bytes of data.
64 bytes from 100.117.144.161: icmp_seq=1 ttl=63 time=0.123 ms
64 bytes from 100.117.144.161: icmp_seq=2 ttl=63 time=0.066 ms
64 bytes from 100.117.144.161: icmp_seq=3 ttl=63 time=0.062 ms
--- 100.117.144.161 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2029ms
rtt min/avg/max/mdev = 0.062/0.083/0.123/0.027 ms
输出(示例:node01 宿主机抓包,注意接口名 cali* 的 In/Out 方向):
tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
12 packets captured
12 packets received by filter
0 packets dropped by kernel
10:20:13.689239 cali3c6889ba186 In IP (tos 0x0, ttl 64, id 60148, offset 0, flags [DF], proto ICMP (1), length 84)
100.117.144.166 > 100.117.144.161: ICMP echo request, id 4, seq 1, length 64
10:20:13.689318 cali8712894f3f8 Out IP (tos 0x0, ttl 63, id 60148, offset 0, flags [DF], proto ICMP (1), length 84)
100.117.144.166 > 100.117.144.161: ICMP echo request, id 4, seq 1, length 64
10:20:13.689329 cali8712894f3f8 In IP (tos 0x0, ttl 64, id 10024, offset 0, flags [none], proto ICMP (1), length 84)
100.117.144.161 > 100.117.144.166: ICMP echo reply, id 4, seq 1, length 64
10:20:13.689335 cali3c6889ba186 Out IP (tos 0x0, ttl 63, id 10024, offset 0, flags [none], proto ICMP (1), length 84)
100.117.144.161 > 100.117.144.166: ICMP echo reply, id 4, seq 1, length 64
10:20:14.694480 cali3c6889ba186 In IP (tos 0x0, ttl 64, id 60154, offset 0, flags [DF], proto ICMP (1), length 84)
100.117.144.166 > 100.117.144.161: ICMP echo request, id 4, seq 2, length 64
10:20:14.694511 cali8712894f3f8 Out IP (tos 0x0, ttl 63, id 60154, offset 0, flags [DF], proto ICMP (1), length 84)
100.117.144.166 > 100.117.144.161: ICMP echo request, id 4, seq 2, length 64
10:20:14.694521 cali8712894f3f8 In IP (tos 0x0, ttl 64, id 10616, offset 0, flags [none], proto ICMP (1), length 84)
100.117.144.161 > 100.117.144.166: ICMP echo reply, id 4, seq 2, length 64
10:20:14.694525 cali3c6889ba186 Out IP (tos 0x0, ttl 63, id 10616, offset 0, flags [none], proto ICMP (1), length 84)
100.117.144.161 > 100.117.144.166: ICMP echo reply, id 4, seq 2, length 64
10:20:15.718477 cali3c6889ba186 In IP (tos 0x0, ttl 64, id 60848, offset 0, flags [DF], proto ICMP (1), length 84)
100.117.144.166 > 100.117.144.161: ICMP echo request, id 4, seq 3, length 64
10:20:15.718505 cali8712894f3f8 Out IP (tos 0x0, ttl 63, id 60848, offset 0, flags [DF], proto ICMP (1), length 84)
100.117.144.166 > 100.117.144.161: ICMP echo request, id 4, seq 3, length 64
10:20:15.718514 cali8712894f3f8 In IP (tos 0x0, ttl 64, id 10666, offset 0, flags [none], proto ICMP (1), length 84)
100.117.144.161 > 100.117.144.166: ICMP echo reply, id 4, seq 3, length 64
10:20:15.718518 cali3c6889ba186 Out IP (tos 0x0, ttl 63, id 10666, offset 0, flags [none], proto ICMP (1), length 84)
100.117.144.161 > 100.117.144.166: ICMP echo reply, id 4, seq 3, length 64
结论(同节点):
-
不会出现 IPIP/VXLAN 等封装包
-
ICMP 包在 node01 上表现为:从源 Pod 对应的
cali*进入、从目的 Pod 对应的cali*发出
不同节点:master01 的 Pod → node02 的 Pod(Calico IPIP)
1)源 Pod 内路由(master01 上的 network-tools-ls6gl)
命令:
kubectl -n net-debug exec network-tools-ls6gl -- bash -lc 'ip route get <dst-podIP>'
(这里同样会显示 via 169.254.1.1 dev eth0)
2)抓包:目标 Pod 看到的是"原始 PodIP 包"
命令:
kubectl -n net-debug exec network-tools-pzq2w -- bash -lc 'timeout 8 tcpdump -ni eth0 icmp -vv'
kubectl -n net-debug exec network-tools-ls6gl -- bash -lc 'ping -c 3 -W 1 <dst-podIP>'
输出(示例:目标 Pod 内抓包):
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:20:21.731095 IP (tos 0x0, ttl 62, id 61954, offset 0, flags [DF], proto ICMP (1), length 84)
100.85.170.174 > 100.95.185.247: ICMP echo request, id 5, seq 1, length 64
10:20:21.731109 IP (tos 0x0, ttl 64, id 34636, offset 0, flags [none], proto ICMP (1), length 84)
100.95.185.247 > 100.85.170.174: ICMP echo reply, id 5, seq 1, length 64
10:20:22.794857 IP (tos 0x0, ttl 62, id 62874, offset 0, flags [DF], proto ICMP (1), length 84)
100.85.170.174 > 100.95.185.247: ICMP echo request, id 5, seq 2, length 64
10:20:22.794870 IP (tos 0x0, ttl 64, id 35413, offset 0, flags [none], proto ICMP (1), length 84)
100.95.185.247 > 100.85.170.174: ICMP echo reply, id 5, seq 2, length 64
10:20:23.818819 IP (tos 0x0, ttl 62, id 63564, offset 0, flags [DF], proto ICMP (1), length 84)
100.85.170.174 > 100.95.185.247: ICMP echo request, id 5, seq 3, length 64
10:20:23.818832 IP (tos 0x0, ttl 64, id 35963, offset 0, flags [none], proto ICMP (1), length 84)
100.95.185.247 > 100.85.170.174: ICMP echo reply, id 5, seq 3, length 64
3)抓包:节点物理网卡看到的是"IPIP 外层 + PodIP 内层"
命令(master01 节点物理网卡抓 proto 4):
kubectl -n net-debug exec hostnet-master01 -- bash -lc 'timeout 8 tcpdump -ni ens33 proto 4 -vv'
输出(示例):
tcpdump: listening on ens33, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:20:21.731193 IP (tos 0x0, ttl 63, id 19778, offset 0, flags [DF], proto IPIP (4), length 104)
192.168.48.80 > 192.168.48.82: IP (tos 0x0, ttl 63, id 61954, offset 0, flags [DF], proto ICMP (1), length 84)
100.85.170.174 > 100.95.185.247: ICMP echo request, id 5, seq 1, length 64
10:20:21.731585 IP (tos 0x0, ttl 63, id 26279, offset 0, flags [none], proto IPIP (4), length 104)
192.168.48.82 > 192.168.48.80: IP (tos 0x0, ttl 63, id 34636, offset 0, flags [none], proto ICMP (1), length 84)
100.95.185.247 > 100.85.170.174: ICMP echo reply, id 5, seq 1, length 64
10:20:22.795002 IP (tos 0x0, ttl 63, id 20812, offset 0, flags [DF], proto IPIP (4), length 104)
192.168.48.80 > 192.168.48.82: IP (tos 0x0, ttl 63, id 62874, offset 0, flags [DF], proto ICMP (1), length 84)
100.85.170.174 > 100.95.185.247: ICMP echo request, id 5, seq 2, length 64
10:20:22.795305 IP (tos 0x0, ttl 63, id 26393, offset 0, flags [none], proto IPIP (4), length 104)
192.168.48.82 > 192.168.48.80: IP (tos 0x0, ttl 63, id 35413, offset 0, flags [none], proto ICMP (1), length 84)
100.95.185.247 > 100.85.170.174: ICMP echo reply, id 5, seq 2, length 64
10:20:23.818991 IP (tos 0x0, ttl 63, id 21386, offset 0, flags [DF], proto IPIP (4), length 104)
192.168.48.80 > 192.168.48.82: IP (tos 0x0, ttl 63, id 63564, offset 0, flags [DF], proto ICMP (1), length 84)
100.85.170.174 > 100.95.185.247: ICMP echo request, id 5, seq 3, length 64
10:20:23.819309 IP (tos 0x0, ttl 63, id 26854, offset 0, flags [none], proto IPIP (4), length 104)
192.168.48.82 > 192.168.48.80: IP (tos 0x0, ttl 63, id 35963, offset 0, flags [none], proto ICMP (1), length 84)
100.95.185.247 > 100.85.170.174: ICMP echo reply, id 5, seq 3, length 64
你会看到两层地址:
-
外层(节点 IP) :
192.168.48.80 -> 192.168.48.82,proto IPIP (4) -
内层(Pod IP) :
100.85.170.174 -> 100.95.185.247(ICMP)
结论(跨节点,IPIP 模式):
-
Pod 发出的依然是 PodIP → PodIP
-
出节点时由 Calico 封装为 IPIP(proto 4) 在节点网络上传输
-
到目标节点后解封装,再交给目标 Pod