Calico IPIP CrossSubnet 与 IPIP 默认模式对比

模式介绍

项目文档:https://docs.tigera.io/calico/latest/networking/configuring/vxlan-ipip#configure-ip-in-ip-encapsulation-for-only-cross-subnet-traffic

使用 Calico IPIP 模式时,CALICO_IPV 4 POOL_IPIP 默认值为 Always,任何情况下跨节点请求都会经过 IPIP 封装,即使两个节点在同一网段下。

Calico 提供了一个选项,可以仅对跨越子网的流量进行封装。建议将跨子网选项与 IPIP 配合使用,可以做到最小化封装开销。

使用场景

参考官网文档

部署流程

本文分别部署默认 IPIP 模式与 IPIP CrossSubnet 模式,分别在请求同网段、不同网段时进行抓包对比

1.通过脚本快速生成 IPIP 默认模式

bash 复制代码
#!/bin/bash

set -v

# 1. Prepare NoCNI environment
cat <<EOF | HTTP_PROXY= HTTPS_PROXY= http_proxy= https_proxy= kind create cluster --name=calico-ipip --image=burlyluo/kindest:v1.27.3 --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  disableDefaultCNI: true
  podSubnet: "10.244.0.0/16"

nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-ip: 10.1.5.10

- role: worker
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-ip: 10.1.5.11

- role: worker
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-ip: 10.1.8.10

- role: worker
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-ip: 10.1.8.11
EOF

# 2. Remove taints
controller_node_ip=`kubectl get node -o wide --no-headers | grep -E "control-plane|bpf1" | awk -F " " '{print $6}'`
kubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/control-plane:NoSchedule-
kubectl get nodes -o wide

./2-setup-clab.sh

# 3. Collect startup message
controller_node_name=$(kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | grep control-plane)
if [ -n "$controller_node_name" ]; then
  timeout 1 docker exec -t $controller_node_name bash -c 'cat << EOF > /root/monitor_startup.sh
#!/bin/bash
ip -ts monitor all > /root/startup_monitor.txt 2>&1
EOF
chmod +x /root/monitor_startup.sh && /root/monitor_startup.sh'
else
  echo "No such controller_node!"
fi

# 4. Install CNI[Calico v3.23.2]
kubectl apply -f calico.yaml

其中 2-setup-clab.sh 的作用是通过 containerlab 创建四个容器,给他们设置 IP 后分别与 kind 创建的四个容器共享网络命名空间,这样 k8s 集群就能使用 kind 参数指定的 node-ip 了:

bash 复制代码
#!/bin/bash

set -v

for br in br-pool0 br-pool1; do
    ip link set $br down > /dev/null 2>&1
    ip link delete $br
    ip link add $br type bridge
    ip link set $br up
done

cat << EOF > clab.yaml | containerlab destroy -t clab.yaml --cleanup -
name: calico-ipip
topology:
  nodes:
    gw0:
      kind: linux
      image: hub.deepflow.yunshan.net/network-demo/vyos:1.4.9
      cmd: /sbin/init
      binds:
        - /lib/modules:/lib/modules
        - ./startup-conf/gw0-boot.cfg:/opt/vyatta/etc/config/config.boot
 
    br-pool0:
      kind: bridge
  
    br-pool1:
      kind: bridge

    server1:
      kind: linux
      image: hub.deepflow.yunshan.net/network-demo/nettool
      network-mode: container:calico-ipip-control-plane
      exec:
      - ip addr add 10.1.5.10/24 dev net0
      - ip route replace default via 10.1.5.1

    server2:
      kind: linux
      image: hub.deepflow.yunshan.net/network-demo/nettool
      network-mode: container:calico-ipip-worker
      exec:
      - ip addr add 10.1.5.11/24 dev net0
      - ip route replace default via 10.1.5.1

    server3:
      kind: linux
      image: hub.deepflow.yunshan.net/network-demo/nettool
      network-mode: container:calico-ipip-worker2
      exec:
      - ip addr add 10.1.8.10/24 dev net0
      - ip route replace default via 10.1.8.1

    server4:
      kind: linux
      image: hub.deepflow.yunshan.net/network-demo/nettool
      network-mode: container:calico-ipip-worker3
      exec:
      - ip addr add 10.1.8.11/24 dev net0
      - ip route replace default via 10.1.8.1

  links:
    - endpoints: ["br-pool0:br-pool0-net0", "server1:net0"]
      mtu: 1500
    - endpoints: ["br-pool0:br-pool0-net1", "server2:net0"]
      mtu: 1500
    - endpoints: ["br-pool1:br-pool1-net0", "server3:net0"]
      mtu: 1500
    - endpoints: ["br-pool1:br-pool1-net1", "server4:net0"]
      mtu: 1500

    - endpoints: ["gw0:eth1", "br-pool0:br-pool0-net2"]
      mtu: 1500
    - endpoints: ["gw0:eth2", "br-pool1:br-pool1-net2"]
      mtu: 1500
EOF

gw0 中 startup-conf/gw0-boot.cfg 文件的作用就是让 10.1.5.0/24 和 10.1.8.0/24 两个子网能互通(两个子网的默认网关都在 gw0 上,gw0 直接转发就行):

json 复制代码
interfaces {
    ethernet eth1 {
        address "10.1.5.1/24"
        duplex "auto"
        speed "auto"
    }
    ethernet eth2 {
        address "10.1.8.1/24"
        duplex "auto"
        speed "auto"
    }
    loopback lo {
    }
}
nat {
    source {
        rule 100 {
            outbound-interface {
                name "eth0"
            }
            source {
                address "10.1.0.0/16"
            }
            translation {
                address "masquerade"
            }
        }
    }
}
system {
    config-management {
        commit-revisions "100"
    }
    console {
        device ttyS0 {
            speed "9600"
        }
    }
    host-name "gw0"
    login {
        user vyos {
            authentication {
                encrypted-password "$6$QxPS.uk6mfo$9QBSo8u1FkH16gMyAVhus6fU3LOzvLR9Z9.82m3tiHFAxTtIkhaZSWssSgzt4v4dGAL8rhVQxTg0oAG9/q11h/"
                plaintext-password ""
            }
        }
    }
    time-zone "UTC"
}
yaml 复制代码
## calico yaml
            # Auto-detect the BGP IP address.
            - name: IP
              value: "autodetect"
            # Enable IPIP
            - name: CALICO_IPV4POOL_IPIP
              value: "Always"
            # Enable or Disable VXLAN on the default IP pool.
            - name: CALICO_IPV4POOL_VXLAN
              value: "Never"
            # Enable or Disable VXLAN on the default IPv6 IP pool.
            - name: CALICO_IPV6POOL_VXLAN
              value: "Never"

2.通过脚本快速生成 IPIP CrossSubnet 模式

其余部署脚本一致,仅在 calico CALICO_IPV4POOL_IPIP 模式中有差异:

bash 复制代码
## calico yaml
            # Auto-detect the BGP IP address.
            - name: IP
              value: "autodetect"
            # Enable IPIP
            - name: CALICO_IPV4POOL_IPIP
              value: "CrossSubnet"
            # Enable or Disable VXLAN on the default IP pool.
            - name: CALICO_IPV4POOL_VXLAN
              value: "Never"
            # Enable or Disable VXLAN on the default IPv6 IP pool.
            - name: CALICO_IPV6POOL_VXLAN
              value: "Never"

创建测试 Pod

本质是 Nginx,用于后续请求抓包使用

yaml 复制代码
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: nginx
  name: pod
spec:
  replicas: 4
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: burlyluo/nettool:latest
        name: nettoolbox
        env:
          - name: NETTOOL_NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
        securityContext:
          privileged: true
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: nginx
            topologyKey: kubernetes.io/hostname

查询部署结果

1.查询 IPIP 默认模式部署结果

bash 复制代码
root@network-demo:~# docker ps --format '{{.Names}}'
clab-calico-ipip-server2
clab-calico-ipip-server4
clab-calico-ipip-server1
clab-calico-ipip-server3
clab-calico-ipip-gw0
calico-ipip-worker
calico-ipip-worker2
calico-ipip-control-plane
calico-ipip-worker3

在主机上看到创建的 br-pool0-net0 网卡与 containerlab 创建的容器中 net0 网卡对应。在 kind 生成的 docker 容器中也能看到相同的网卡,说明已经共享了同一个网络空间:

bash 复制代码
root@network-demo:~# ip -d link show br-pool0-net0
198: br-pool0-net0@if197: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-pool0 state UP mode DEFAULT group default 
    link/ether aa:c1:ab:1c:c9:1c brd ff:ff:ff:ff:ff:ff link-netns clab-calico-ipip-server1 promiscuity 1  allmulti 1 minmtu 68 maxmtu 65535 
    veth 
    bridge_slave state forwarding priority 32 cost 2 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.c6:58:98:9d:5f:ea designated_root 8000.c6:58:98:9d:5f:ea hold_timer    0.00 message_age_timer    0.00 forward_delay_timer    0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on bcast_flood on mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off locked off addrgenmode eui64 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536

root@network-demo:~# docker exec -it clab-calico-ipip-server1 ip -d link show net0
197: net0@if198: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether aa:c1:ab:bd:45:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535 
    veth addrgenmode eui64 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535 

root@network-demo:~# docker exec -it calico-ipip-control-plane ip -d link show net0
197: net0@if198: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether aa:c1:ab:bd:45:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535 
    veth addrgenmode eui64 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535
bash 复制代码
root@network-demo:~# kubectl get pods -A -o wide
NAMESPACE            NAME                                  READY   STATUS    RESTARTS   AGE   IP              NODE
kube-system          calico-kube-controllers               1/1     Running   0          16m   10.244.51.196   calico-ipip-control-plane
kube-system          calico-node-64f6p                     1/1     Running   0          16m   10.1.5.10       calico-ipip-control-plane
kube-system          calico-node-p4ks7                     1/1     Running   0          16m   10.1.5.11       calico-ipip-worker
kube-system          calico-node-pjbc7                     1/1     Running   0          16m   10.1.8.11       calico-ipip-worker3
kube-system          calico-node-r6rk2                     1/1     Running   0          16m   10.1.8.10       calico-ipip-worker2
kube-system          coredns-5d78c9869d-jx4lx              1/1     Running   0          17m   10.244.51.194   calico-ipip-control-plane
kube-system          coredns-5d78c9869d-mrf2d              1/1     Running   0          17m   10.244.51.195   calico-ipip-control-plane
kube-system          etcd-calico-ipip                      1/1     Running   0          17m   10.1.5.10       calico-ipip-control-plane
kube-system          kube-apiserver-calico-ipip            1/1     Running   0          17m   10.1.5.10       calico-ipip-control-plane
kube-system          kube-controller-manager-calico-ipip   1/1     Running   0          17m   10.1.5.10       calico-ipip-control-plane
kube-system          kube-proxy-4svbw                      1/1     Running   0          17m   10.1.8.10       calico-ipip-worker2
kube-system          kube-proxy-4zw9q                      1/1     Running   0          17m   10.1.5.10       calico-ipip-control-plane
kube-system          kube-proxy-5nnfn                      1/1     Running   0          17m   10.1.8.11       calico-ipip-worker3
kube-system          kube-proxy-b69xp                      1/1     Running   0          17m   10.1.5.11       calico-ipip-worker
kube-system          kube-scheduler-calico-ipip            1/1     Running   0          17m   10.1.5.10       calico-ipip-control-plane

root@network-demo:~# kubectl describe pods -n kube-system calico-node-64f6p | grep 'CALICO_IPV4POOL'
      CALICO_IPV4POOL_IPIP:               Always
      CALICO_IPV4POOL_VXLAN:              Never

root@network-demo:~# kubectl get node -o wide
NAME                        STATUS   ROLES           AGE   VERSION   INTERNAL-IP
calico-ipip-control-plane   Ready    control-plane   19m   v1.27.3   10.1.5.10
calico-ipip-worker          Ready    <none>          19m   v1.27.3   10.1.5.11
calico-ipip-worker2         Ready    <none>          19m   v1.27.3   10.1.8.10
calico-ipip-worker3         Ready    <none>          19m   v1.27.3   10.1.8.11

2.查询 IPIP CrossSubnet 部署结果

bash 复制代码
root@network-demo:~# docker ps --format '{{.Names}}'
clab-calico-ipip-crosssubnet-server2
clab-calico-ipip-crosssubnet-server3
clab-calico-ipip-crosssubnet-server1
clab-calico-ipip-crosssubnet-server4
clab-calico-ipip-crosssubnet-gw0
calico-ipip-crosssubnet-control-plane
calico-ipip-crosssubnet-worker
calico-ipip-crosssubnet-worker2
calico-ipip-crosssubnet-worker3
bash 复制代码
root@network-demo:~# kubectl get pods -A -o wide
NAMESPACE            NAME                                              READY   STATUS    RESTARTS   AGE   IP               NODE
default              pod-0                                             1/1     Running   0          29s   10.244.85.129    calico-ipip-crosssubnet-worker
default              pod-1                                             1/1     Running   0          22s   10.244.241.130   calico-ipip-crosssubnet-worker3
default              pod-2                                             1/1     Running   0          16s   10.244.193.197   calico-ipip-crosssubnet-worker2
default              pod-3                                             1/1     Running   0          10s   10.244.81.1      calico-ipip-crosssubnet-control-plane
kube-system          calico-kube-controllers-7bdccfc7d8-lgmf8          1/1     Running   0          33m   10.244.193.195   calico-ipip-crosssubnet-worker2
kube-system          calico-node-b22wn                                 1/1     Running   0          33m   10.1.8.11        calico-ipip-crosssubnet-worker3
kube-system          calico-node-h7tds                                 1/1     Running   0          33m   10.1.5.11        calico-ipip-crosssubnet-worker
kube-system          calico-node-tthgb                                 1/1     Running   0          33m   10.1.8.10        calico-ipip-crosssubnet-worker2
kube-system          calico-node-wf2g8                                 1/1     Running   0          33m   10.1.5.10        calico-ipip-crosssubnet-control-plane
kube-system          coredns-5d78c9869d-26vp9                          1/1     Running   0          33m   10.244.193.194   calico-ipip-crosssubnet-worker2
kube-system          coredns-5d78c9869d-qd44j                          1/1     Running   0          33m   10.244.193.193   calico-ipip-crosssubnet-worker2
kube-system          etcd-calico-ipip-crosssubnet                      1/1     Running   0          33m   10.1.5.10        calico-ipip-crosssubnet-control-plane
kube-system          kube-apiserver-calico-ipip-crosssubnet            1/1     Running   0          33m   10.1.5.10        calico-ipip-crosssubnet-control-plane
kube-system          kube-controller-manager-calico-ipip-crosssubnet   1/1     Running   0          33m   10.1.5.10        calico-ipip-crosssubnet-control-plane
kube-system          kube-proxy-4rkfq                                  1/1     Running   0          33m   10.1.5.11        calico-ipip-crosssubnet-worker
kube-system          kube-proxy-5xblr                                  1/1     Running   0          33m   10.1.5.10        calico-ipip-crosssubnet-control-plane
kube-system          kube-proxy-j7cfk                                  1/1     Running   0          33m   10.1.8.10        calico-ipip-crosssubnet-worker2
kube-system          kube-proxy-tlj5m                                  1/1     Running   0          33m   10.1.8.11        calico-ipip-crosssubnet-worker3
kube-system          kube-scheduler-calico-ipip-crosssubnet            1/1     Running   0          33m   10.1.5.10        calico-ipip-crosssubnet-control-plane

root@network-demo:~# kubectl describe pods -n kube-system calico-node-wf2g8 | grep 'CALICO_IPV4POOL'
      CALICO_IPV4POOL_IPIP:               CrossSubnet
      CALICO_IPV4POOL_VXLAN:              Never

root@network-demo:~# kubectl get node -o wide
NAME                                    STATUS   ROLES           AGE   VERSION   INTERNAL-IP
calico-ipip-crosssubnet-control-plane   Ready    control-plane   32m   v1.27.3   10.1.5.10
calico-ipip-crosssubnet-worker          Ready    <none>          32m   v1.27.3   10.1.5.11
calico-ipip-crosssubnet-worker2         Ready    <none>          32m   v1.27.3   10.1.8.10
calico-ipip-crosssubnet-worker3         Ready    <none>          32m   v1.27.3   10.1.8.11

验证效果

1.验证 IPIP 默认模式效果

具体逻辑细节请看 Calico IPIP 文章,里面详细讲了 BGP、路由表走向。本文仅作两种模式差异点对比

1.1.跨子网 Pod 请求验证

1.1.1.查询 control-plane 主机路由表

bash 复制代码
root@network-demo:~# docker exec -it calico-ipip-control-plane ip route show
default via 10.1.5.1 dev net0 
10.1.5.0/24 dev net0 proto kernel scope link src 10.1.5.10 
blackhole 10.244.51.192/26 proto bird 
10.244.51.193 dev calid7e32e8230e scope link 
10.244.51.194 dev calie67bc01f3de scope link 
10.244.51.195 dev cali6f867153050 scope link 
10.244.51.196 dev cali5d8decaab2b scope link 
10.244.51.197 dev cali87081bf6f89 scope link 
10.244.54.128/26 via 10.1.8.11 dev tunl0 proto bird onlink 
10.244.79.0/26 via 10.1.5.11 dev tunl0 proto bird onlink 
10.244.244.64/26 via 10.1.8.10 dev tunl0 proto bird onlink 
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3

root@network-demo:~# docker exec -it calico-ipip-control-plane ip route show proto bird
blackhole 10.244.51.192/26 
10.244.54.128/26 via 10.1.8.11 dev tunl0 onlink 
10.244.79.0/26 via 10.1.5.11 dev tunl0 onlink 
10.244.244.64/26 via 10.1.8.10 dev tunl0 onlink

root@network-demo:~# docker exec -it calico-ipip-control-plane ip neighbor show 
10.244.51.194 dev calie67bc01f3de lladdr b2:df:0d:1f:68:0f REACHABLE
172.18.0.4 dev eth0 lladdr 62:fe:7e:39:f7:13 REACHABLE
10.244.51.195 dev cali6f867153050 lladdr 72:50:a4:df:7e:08 REACHABLE
172.18.0.1 dev eth0 lladdr d2:6a:15:c7:e3:41 STALE
10.244.51.196 dev cali5d8decaab2b lladdr 06:11:33:a2:c0:b6 REACHABLE
10.1.5.1 dev net0 lladdr aa:c1:ab:eb:cb:6f REACHABLE
10.244.51.193 dev calid7e32e8230e lladdr 8a:9c:24:95:38:db REACHABLE
172.18.0.2 dev eth0 lladdr ee:f7:6a:f4:71:dd REACHABLE
10.244.51.197 dev cali87081bf6f89 lladdr c2:7f:e0:da:10:e1 STALE
10.1.5.11 dev net0 lladdr aa:c1:ab:2a:5a:0c REACHABLE
172.18.0.5 dev eth0 lladdr 32:a4:f7:ab:a8:9d REACHABLE
172:18:0:1::2 dev eth0 lladdr ee:f7:6a:f4:71:dd REACHABLE
fe80::60fe:7eff:fe39:f713 dev eth0 lladdr 62:fe:7e:39:f7:13 STALE
172:18:0:1::4 dev eth0 lladdr 62:fe:7e:39:f7:13 REACHABLE
fe80::30a4:f7ff:feab:a89d dev eth0 lladdr 32:a4:f7:ab:a8:9d STALE
172:18:0:1::5 dev eth0 lladdr 32:a4:f7:ab:a8:9d REACHABLE

1.1.2.跨子网 Pod 请求抓包

control 节点 10.1.5.x 网段 Pod 请求 worker2 节点 10.1.8.x Pod:

bash 复制代码
root@network-demo:~# kubectl get pods -o wide
NAME    READY   STATUS    RESTARTS   AGE     IP              NODE
pod-0   1/1     Running   0          9m10s   10.244.79.1     calico-ipip-worker
pod-1   1/1     Running   0          9m3s    10.244.54.129   calico-ipip-worker3
pod-2   1/1     Running   0          8m54s   10.244.244.65   calico-ipip-worker2
pod-3   1/1     Running   0          8m46s   10.244.51.197   calico-ipip-control-plane
bash 复制代码
root@network-demo:~# kubectl exec -it pod-3 -- curl -s 10.244.244.65
PodName: pod-2 | PodIP: eth0 10.244.244.65/32

按照路由表规则,流程大致如下:

  1. 请求 10.244.244.65 后,当路由来到 Client Node 主机时匹配 10.244.244.64/26 via 10.1.8.10 dev tunl0 proto bird onlink 路由;
  2. 内核把报文交给 tunl0 设备后进行 IPIP 封装后,进行下面的路由查询;
  3. 将 dst ip 设置为 via 10.1.8.10,而发给 10.1.8.10 需要走 default via 10.1.5.1 dev net0 这条路由;
  4. 走 via 10.1.5.1 时匹配到 10.1.5.0/24 dev net0 proto kernel scope link src 10.1.5.10 这条路由;
  5. 因为设置了 scope link 直连,src 10.1.5.10 dev net0 查询 APR 表:10.1.5.1 aa:c1🆎eb:cb:6f 后发至网关。
bash 复制代码
root@network-demo:~# docker exec -it calico-ipip-control-plane tcpdump -pnei net0

16:22:36.035362 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 94: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [S], seq 4172879107, win 64800, options [mss 1440,sackOK,TS val 1222065392 ecr 0,nop,wscale 7], length 0
16:22:36.035506 aa:c1:ab:eb:cb:6f > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 94: 10.1.8.10 > 10.1.5.10: 10.244.244.65.80 > 10.244.51.197.60936: Flags [S.], seq 3646446642, ack 4172879108, win 64260, options [mss 1440,sackOK,TS val 2658799917 ecr 1222065392,nop,wscale 7], length 0
16:22:36.035539 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [.], ack 1, win 507, options [nop,nop,TS val 1222065392 ecr 2658799917], length 0
16:22:36.035607 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 163: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [P.], seq 1:78, ack 1, win 507, options [nop,nop,TS val 1222065392 ecr 2658799917], length 77: HTTP: GET / HTTP/1.1
16:22:36.035646 aa:c1:ab:eb:cb:6f > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 86: 10.1.8.10 > 10.1.5.10: 10.244.244.65.80 > 10.244.51.197.60936: Flags [.], ack 78, win 502, options [nop,nop,TS val 2658799917 ecr 1222065392], length 0
16:22:36.035764 aa:c1:ab:eb:cb:6f > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 322: 10.1.8.10 > 10.1.5.10: 10.244.244.65.80 > 10.244.51.197.60936: Flags [P.], seq 1:237, ack 78, win 502, options [nop,nop,TS val 2658799917 ecr 1222065392], length 236: HTTP: HTTP/1.1 200 OK
16:22:36.035817 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [.], ack 237, win 506, options [nop,nop,TS val 1222065392 ecr 2658799917], length 0
16:22:36.035867 aa:c1:ab:eb:cb:6f > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 132: 10.1.8.10 > 10.1.5.10: 10.244.244.65.80 > 10.244.51.197.60936: Flags [P.], seq 237:283, ack 78, win 502, options [nop,nop,TS val 2658799917 ecr 1222065392], length 46: HTTP
16:22:36.035887 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [.], ack 283, win 506, options [nop,nop,TS val 1222065392 ecr 2658799917], length 0
16:22:36.035983 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [F.], seq 78, ack 283, win 506, options [nop,nop,TS val 1222065392 ecr 2658799917], length 0
16:22:36.036057 aa:c1:ab:eb:cb:6f > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 86: 10.1.8.10 > 10.1.5.10: 10.244.244.65.80 > 10.244.51.197.60936: Flags [F.], seq 283, ack 79, win 502, options [nop,nop,TS val 2658799917 ecr 1222065392], length 0
16:22:36.036096 aa:c1:ab:bd:45:17 > aa:c1:ab:eb:cb:6f, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.51.197.60936 > 10.244.244.65.80: Flags [.], ack 284, win 506, options [nop,nop,TS val 1222065393 ecr 2658799917], length 0

1.2.同子网 Pod 请求验证

1.2.1.查询 control-plane 主机路由表

详见:1.1.1.查询 control-plane 主机路由表,不再重复。

1.2.2.同子网 Pod 请求抓包

control 节点 10.1.5.x 网段 Pod 请求 worker 节点 10.1.5.x Pod:

bash 复制代码
root@network-demo:~# kubectl exec -it pod-3 -- curl -s 10.244.79.1
PodName: pod-0 | PodIP: eth0 10.244.79.1/32

按照路由表规则,流程大致如下:

  1. 请求 10.244.79.1 后,当路由来到 Client Node 主机时匹配 10.244.79.0/26 via 10.1.5.11 dev tunl0 proto bird onlink 路由;
  2. 内核把报文交给 tunl0 设备后进行 IPIP 封装后,进行下面的路由查询;
  3. 将 dst ip 设置为 via 10.1.5.11,需要走 10.1.5.0/24 dev net0 proto kernel scope link src 10.1.5.10 这条路由;
  4. 因为设置了 scope link 直连,查 ARP 表找到 10.1.5.11 的 MAC aa:c1🆎2a:5a:0c,直接从 net0 发给 worker,不需要经过网关。
bash 复制代码
root@network-demo:~# docker exec -it calico-ipip-control-plane tcpdump -pnei net0

17:02:39.493480 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 94: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [S], seq 3200333625, win 64800, options [mss 1440,sackOK,TS val 2011167947 ecr 0,nop,wscale 7], length 0
17:02:39.493608 aa:c1:ab:2a:5a:0c > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 94: 10.1.5.11 > 10.1.5.10: 10.244.79.1.80 > 10.244.51.197.45792: Flags [S.], seq 3446311928, ack 3200333626, win 64260, options [mss 1440,sackOK,TS val 2306157208 ecr 2011167947,nop,wscale 7], length 0
17:02:39.493650 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [.], ack 1, win 507, options [nop,nop,TS val 2011167947 ecr 2306157208], length 0
17:02:39.493741 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 161: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [P.], seq 1:76, ack 1, win 507, options [nop,nop,TS val 2011167947 ecr 2306157208], length 75: HTTP: GET / HTTP/1.1
17:02:39.493790 aa:c1:ab:2a:5a:0c > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 86: 10.1.5.11 > 10.1.5.10: 10.244.79.1.80 > 10.244.51.197.45792: Flags [.], ack 76, win 502, options [nop,nop,TS val 2306157208 ecr 2011167947], length 0
17:02:39.493900 aa:c1:ab:2a:5a:0c > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 322: 10.1.5.11 > 10.1.5.10: 10.244.79.1.80 > 10.244.51.197.45792: Flags [P.], seq 1:237, ack 76, win 502, options [nop,nop,TS val 2306157208 ecr 2011167947], length 236: HTTP: HTTP/1.1 200 OK
17:02:39.493957 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [.], ack 237, win 506, options [nop,nop,TS val 2011167947 ecr 2306157208], length 0
17:02:39.494011 aa:c1:ab:2a:5a:0c > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 130: 10.1.5.11 > 10.1.5.10: 10.244.79.1.80 > 10.244.51.197.45792: Flags [P.], seq 237:281, ack 76, win 502, options [nop,nop,TS val 2306157208 ecr 2011167947], length 44: HTTP
17:02:39.494033 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [.], ack 281, win 506, options [nop,nop,TS val 2011167947 ecr 2306157208], length 0
17:02:39.494160 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [F.], seq 76, ack 281, win 506, options [nop,nop,TS val 2011167948 ecr 2306157208], length 0
17:02:39.494275 aa:c1:ab:2a:5a:0c > aa:c1:ab:bd:45:17, ethertype IPv4 (0x0800), length 86: 10.1.5.11 > 10.1.5.10: 10.244.79.1.80 > 10.244.51.197.45792: Flags [F.], seq 281, ack 77, win 502, options [nop,nop,TS val 2306157209 ecr 2011167948], length 0
17:02:39.494324 aa:c1:ab:bd:45:17 > aa:c1:ab:2a:5a:0c, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.5.11: 10.244.51.197.45792 > 10.244.79.1.80: Flags [.], ack 282, win 506, options [nop,nop,TS val 2011167948 ecr 2306157209], length 0

2.验证 IPIP CrossSubnet 模式效果

2.1.跨子网 Pod 请求验证

2.1.1.查询 control-plane 主机路由表

bash 复制代码
root@network-demo:~# docker exec -it calico-ipip-crosssubnet-control-plane ip route show
default via 10.1.5.1 dev net0
10.1.5.0/24 dev net0 proto kernel scope link src 10.1.5.10
blackhole 10.244.81.0/26 proto bird
10.244.81.1 dev cali87081bf6f89 scope link
10.244.85.128/26 via 10.1.5.11 dev net0 proto bird
10.244.193.192/26 via 10.1.8.10 dev tunl0 proto bird onlink
10.244.241.128/26 via 10.1.8.11 dev tunl0 proto bird onlink
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3

root@network-demo:~# docker exec -it calico-ipip-crosssubnet-control-plane ip route show proto bird
blackhole 10.244.81.0/26
10.244.85.128/26 via 10.1.5.11 dev net0
10.244.193.192/26 via 10.1.8.10 dev tunl0 onlink
10.244.241.128/26 via 10.1.8.11 dev tunl0 onlink

root@network-demo:~# docker exec -it calico-ipip-crosssubnet-control-plane ip neighbor show 
10.244.81.1 dev cali87081bf6f89 lladdr c6:27:94:49:93:c3 STALE
172.18.0.1 dev eth0 lladdr d2:6a:15:c7:e3:41 STALE
172.18.0.4 dev eth0 lladdr 82:92:99:ed:bf:60 REACHABLE
10.1.5.11 dev net0 lladdr aa:c1:ab:91:69:5b STALE
10.1.5.1 dev net0 lladdr aa:c1:ab:8f:b5:3b REACHABLE
172.18.0.2 dev eth0 lladdr aa:7e:87:80:90:17 REACHABLE
172.18.0.5 dev eth0 lladdr 16:c2:d8:16:24:e5 REACHABLE
fe80::8092:99ff:feed:bf60 dev eth0 lladdr 82:92:99:ed:bf:60 STALE
172:18:0:1::4 dev eth0 lladdr 82:92:99:ed:bf:60 REACHABLE
fe80::14c2:d8ff:fe16:24e5 dev eth0 lladdr 16:c2:d8:16:24:e5 STALE
172:18:0:1::5 dev eth0 lladdr 16:c2:d8:16:24:e5 REACHABLE
fe80::a87e:87ff:fe80:9017 dev eth0 lladdr aa:7e:87:80:90:17 STALE
172:18:0:1::2 dev eth0 lladdr aa:7e:87:80:90:17 REACHABLE

2.1.2.跨子网 Pod 请求抓包

control 节点 10.1.5.x 网段 Pod 请求 worker2 节点 10.1.8.x Pod:

bash 复制代码
root@network-demo:~# kubectl get pods -o wide
NAME    READY   STATUS    RESTARTS   AGE     IP               NODE
pod-0   1/1     Running   0          3m59s   10.244.85.129    calico-ipip-crosssubnet-worker
pod-1   1/1     Running   0          3m52s   10.244.241.130   calico-ipip-crosssubnet-worker3
pod-2   1/1     Running   0          3m46s   10.244.193.197   calico-ipip-crosssubnet-worker2
pod-3   1/1     Running   0          3m40s   10.244.81.1      calico-ipip-crosssubnet-control-plane
bash 复制代码
root@network-demo:~# kubectl exec -it pod-3 -- curl -s 10.244.193.197
PodName: pod-2 | PodIP: eth0 10.244.193.197/32

按照路由表规则,流程大致如下:

  1. 请求 10.244.193.197 后,当路由来到 Client Node 主机时匹配 10.244.193.192/26 via 10.1.8.10 dev tunl0 proto bird onlink 路由;
  2. 内核把报文交给 tunl0 设备后进行 IPIP 封装后,进行下面的路由查询;
  3. 将 dst ip 设置为 via 10.1.8.10,而发给 10.1.8.10 需要走 default via 10.1.5.1 dev net0 这条路由;
  4. 走 via 10.1.5.1 时匹配到 10.1.5.0/24 dev net0 proto kernel scope link src 10.1.5.10 这条路由;
  5. 因为设置了 scope link 直连,src 10.1.5.10 dev net0 查询 APR 表:10.1.5.1 aa:c1🆎8f:b5:3b 后发至网关。
bash 复制代码
root@network-demo:~# docker exec -it calico-ipip-crosssubnet-control-plane tcpdump -pnei net0

14:10:00.102447 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 94: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [S], seq 3233989932, win 64800, options [mss 1440,sackOK,TS val 128566485 ecr 0,nop,wscale 7], length 0
14:10:00.102586 aa:c1:ab:8f:b5:3b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 94: 10.1.8.10 > 10.1.5.10: 10.244.193.197.80 > 10.244.81.1.44624: Flags [S.], seq 2286706233, ack 3233989933, win 64260, options [mss 1440,sackOK,TS val 4272961461 ecr 128566485,nop,wscale 7], length 0
14:10:00.102617 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [.], ack 1, win 507, options [nop,nop,TS val 128566485 ecr 4272961461], length 0
14:10:00.102698 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 164: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [P.], seq 1:79, ack 1, win 507, options [nop,nop,TS val 128566485 ecr 4272961461], length 78: HTTP: GET / HTTP/1.1
14:10:00.102747 aa:c1:ab:8f:b5:3b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 86: 10.1.8.10 > 10.1.5.10: 10.244.193.197.80 > 10.244.81.1.44624: Flags [.], ack 79, win 502, options [nop,nop,TS val 4272961461 ecr 128566485], length 0
14:10:00.102828 aa:c1:ab:8f:b5:3b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 322: 10.1.8.10 > 10.1.5.10: 10.244.193.197.80 > 10.244.81.1.44624: Flags [P.], seq 1:237, ack 79, win 502, options [nop,nop,TS val 4272961461 ecr 128566485], length 236: HTTP: HTTP/1.1 200 OK
14:10:00.102866 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [.], ack 237, win 506, options [nop,nop,TS val 128566485 ecr 4272961461], length 0
14:10:00.102929 aa:c1:ab:8f:b5:3b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 133: 10.1.8.10 > 10.1.5.10: 10.244.193.197.80 > 10.244.81.1.44624: Flags [P.], seq 237:284, ack 79, win 502, options [nop,nop,TS val 4272961461 ecr 128566485], length 47: HTTP
14:10:00.102959 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [.], ack 284, win 506, options [nop,nop,TS val 128566485 ecr 4272961461], length 0
14:10:00.103171 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [F.], seq 79, ack 284, win 506, options [nop,nop,TS val 128566486 ecr 4272961461], length 0
14:10:00.103349 aa:c1:ab:8f:b5:3b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 86: 10.1.8.10 > 10.1.5.10: 10.244.193.197.80 > 10.244.81.1.44624: Flags [F.], seq 284, ack 80, win 502, options [nop,nop,TS val 4272961462 ecr 128566486], length 0
14:10:00.103404 aa:c1:ab:22:9e:a1 > aa:c1:ab:8f:b5:3b, ethertype IPv4 (0x0800), length 86: 10.1.5.10 > 10.1.8.10: 10.244.81.1.44624 > 10.244.193.197.80: Flags [.], ack 285, win 506, options [nop,nop,TS val 128566486 ecr 4272961462], length 0

2.2.同子网 Pod 请求验证

2.2.1.查询 control-plane 主机路由表

详见:2.1.1.查询 control-plane 主机路由表,不再重复。

2.2.2.同子网 Pod 请求抓包

control 节点 10.1.5.x 网段 Pod 请求 worker 节点 10.1.5.x Pod:

bash 复制代码
root@network-demo:~# kubectl exec -it pod-3 -- curl -s 10.244.85.129
PodName: pod-0 | PodIP: eth0 10.244.85.129/32
  1. 请求同子网 Pod 10.244.85.129,匹配路由 10.244.85.128/26 via 10.1.5.11 dev net0 proto bird,注意这里是 dev net0,不是 tunl0,所以不会进行 IPIP 封装
  2. 下一跳 10.1.5.11 在同网段,匹配路由 10.1.5.0/24 dev net0 proto kernel scope link src 10.1.5.10
  3. scope link 直连,查 ARP 表:10.1.5.11 dev net0 lladdr aa:c1🆎91:69:5b REACHABLE;
  4. 查到的 dst mac 是 Server Node net0 地址,通过本机 net0 发过去。
bash 复制代码
root@network-demo:~# docker exec -it calico-ipip-crosssubnet-control-plane tcpdump -pnei net0

14:45:28.324182 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 74: 10.244.81.1.47978 > 10.244.85.129.80: Flags [S], seq 980755404, win 64800, options [mss 1440,sackOK,TS val 3053371879 ecr 0,nop,wscale 7], length 0
14:45:28.324276 aa:c1:ab:91:69:5b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 74: 10.244.85.129.80 > 10.244.81.1.47978: Flags [S.], seq 295421793, ack 980755405, win 64260, options [mss 1440,sackOK,TS val 1697046978 ecr 3053371879,nop,wscale 7], length 0
14:45:28.324297 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 66: 10.244.81.1.47978 > 10.244.85.129.80: Flags [.], ack 1, win 507, options [nop,nop,TS val 3053371879 ecr 1697046978], length 0
14:45:28.324355 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 143: 10.244.81.1.47978 > 10.244.85.129.80: Flags [P.], seq 1:78, ack 1, win 507, options [nop,nop,TS val 3053371879 ecr 1697046978], length 77: HTTP: GET / HTTP/1.1
14:45:28.324376 aa:c1:ab:91:69:5b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 66: 10.244.85.129.80 > 10.244.81.1.47978: Flags [.], ack 78, win 502, options [nop,nop,TS val 1697046978 ecr 3053371879], length 0
14:45:28.324474 aa:c1:ab:91:69:5b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 302: 10.244.85.129.80 > 10.244.81.1.47978: Flags [P.], seq 1:237, ack 78, win 502, options [nop,nop,TS val 1697046978 ecr 3053371879], length 236: HTTP: HTTP/1.1 200 OK
14:45:28.324508 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 66: 10.244.81.1.47978 > 10.244.85.129.80: Flags [.], ack 237, win 506, options [nop,nop,TS val 3053371879 ecr 1697046978], length 0
14:45:28.324541 aa:c1:ab:91:69:5b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 112: 10.244.85.129.80 > 10.244.81.1.47978: Flags [P.], seq 237:283, ack 78, win 502, options [nop,nop,TS val 1697046978 ecr 3053371879], length 46: HTTP
14:45:28.324554 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 66: 10.244.81.1.47978 > 10.244.85.129.80: Flags [.], ack 283, win 506, options [nop,nop,TS val 3053371879 ecr 1697046978], length 0
14:45:28.324652 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 66: 10.244.81.1.47978 > 10.244.85.129.80: Flags [F.], seq 78, ack 283, win 506, options [nop,nop,TS val 3053371879 ecr 1697046978], length 0
14:45:28.324741 aa:c1:ab:91:69:5b > aa:c1:ab:22:9e:a1, ethertype IPv4 (0x0800), length 66: 10.244.85.129.80 > 10.244.81.1.47978: Flags [F.], seq 283, ack 79, win 502, options [nop,nop,TS val 1697046978 ecr 3053371879], length 0
14:45:28.324771 aa:c1:ab:22:9e:a1 > aa:c1:ab:91:69:5b, ethertype IPv4 (0x0800), length 66: 10.244.81.1.47978 > 10.244.85.129.80: Flags [.], ack 284, win 506, options [nop,nop,TS val 3053371879 ecr 1697046978], length 0