Kubernetes网络性能测试

Kubernetes 网络性能测试

基于已经搭建的Kubernetes环境，来测试其网络性能。

1. 测试准备

1.1 测试环境

测试环境为VMware Workstation虚拟机搭建的一套K8S环境，版本为1.19，网络插件使用flannel。

hostname	ip	备注
k8s-master	192.168.0.51	master
k8s-node1	192.168.0.52	worker
k8s-node2	192.168.0.53	worker

已经部署测试应用sample-webapp，三个副本：

shell 复制代码

[root@k8s-master ~]# kubectl get pod -n ingress-traefik | grep sample-webapp
sample-webapp-4wf7c                1/1     Running   0          46s
sample-webapp-jvdpv                1/1     Running   10         7d23h
sample-webapp-kdk9k                1/1     Running   0          46s

[root@k8s-master ~]# kubectl get svc -n ingress-traefik | grep sample-webapp
sample-webapp   ClusterIP   10.98.210.117   <none>        8000/TCP                     10d

备注：

本文测试环境，master节点取消了NoSchedule的Taints，使得master节点也可以调度业务pod。

1.2 测试场景

Kubernetes 集群 node 节点上通过 Cluster IP 方式访问
Kubernetes 集群内部通过 service 访问
Kubernetes 集群外部通过 traefik ingress 暴露的地址访问

1.3 测试工具

Locust：一个简单易用的用户负载测试工具，用来测试 web 或其他系统能够同时处理的并发用户数。
curl
测试程序：sample-webapp，源码见Github kubernetes 的分布式负载测试。

1.4 测试说明

通过向 sample-webapp 发送 curl 请求获取响应时间，直接 curl 后的结果为：

shell 复制代码

[root@k8s-master ~]# curl "http://10.98.210.117:8000"
Welcome to the "Distributed Load Testing Using Kubernetes" sample web app

2. 网络延迟测试

2.1 Kubernetes 集群 node 节点上通过 Cluster IP 访问

测试命令

curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}' "http://10.98.210.117:8000/"

测试10组数据，取平均结果：

shell 复制代码

[root@k8s-node1 ~]# echo "time_connect  time_starttransfer time_total"; for i in {1..10}; do curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}\n' "http://10.98.210.117:8000/"; done
time_connect  time_starttransfer time_total
0.000 0.002 0.002
0.000 0.001 0.001
0.001 0.003 0.003
0.001 0.003 0.003
0.001 0.006 0.006
0.001 0.003 0.003
0.001 0.004 0.004
0.001 0.003 0.003
0.001 0.004 0.004
0.000 0.002 0.002

平均响应时间：3.1 ms

指标说明：

time_connect：建立到服务器的 TCP 连接所用的时间
time_starttransfer：在发出请求之后，Web 服务器返回数据的第一个字节所用的时间
time_total：完成请求所用的时间

2.2 Kubernetes 集群内部通过 service 访问

shell 复制代码

# 进入测试的客户端
[root@k8s-node1 ~]# kubectl exec -it locust-master-vljrx -n ingress-traefik /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
# 执行测试
root@locust-master-vljrx:/# echo "time_connect  time_starttransfer time_total"; for i in {1..10}; do curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}\n' "http://sample-webapp:8000/"; done
time_connect  time_starttransfer time_total
0.001326 0.002542 0.002715
0.001444 0.003264 0.003675
0.002193 0.004262 0.004889
0.002066 0.003664 0.003876
0.001739 0.004095 0.004432
0.002339 0.004536 0.004647
0.001649 0.003288 0.003628
0.001794 0.003373 0.003911
0.001492 0.003201 0.003581
0.002036 0.003712 0.004109

平均响应时间：约4ms

2.3 在外部通过 traefik ingress 访问

本文使用了traefik作为k8s集群的网关入口，可以参考我的另外一篇文章部署traefik2.0。也可以将service改为NodePort类型进行测试。

shell 复制代码

# 添加路由规则，使得访问sample-webapp.test.com可以路由到sample-webapp服务
[root@k8s-master ~]# kubectl get ingressroute -n ingress-traefik traefik-dashboard-route -o yaml
...
resourceVersion: "975669"
  selfLink: /apis/traefik.containo.us/v1alpha1/namespaces/ingress-traefik/ingressroutes/traefik-dashboard-route
  uid: 4faeecab-cd87-406f-9d50-3d507a6b73ff
spec:
  entryPoints:
  - web
  routes:
  - kind: Rule
    match: Host(`traefik.test.com`)
    services:
    - name: traefik
      port: 8080
  - kind: Rule
    match: Host(`locust.test.com`)
    services:
    - name: locust-master
      port: 8089
  - kind: Rule
    match: Host(`sample-webapp.test.com`)
    services:
    - name: sample-webapp
      port: 8000

在外部的客户端添加域名解析后，进行访问测试：

shell 复制代码

[root@bogon ~]# echo "time_connect  time_starttransfer time_total"; for i in {1..10}; do curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}\n' "http://sample-webapp.test.com/"; done
time_connect  time_starttransfer time_total
0.048 0.056 0.062
0.030 0.108 0.115
0.029 0.036 0.046
0.048 0.111 0.119
0.021 0.030 0.030
0.017 0.022 0.022
0.025 0.031 0.036
0.021 0.028 0.028
0.039 0.045 0.045
0.020 0.025 0.029

平均响应时间：53.2ms

2.4 测试结果

在这三种场景下的响应时间测试结果如下：

Kubernetes 集群 node 节点上通过 Cluster IP 方式访问：3.1 ms

Kubernetes 集群内部通过 service 访问：4 ms

Kubernetes 集群外部通过 traefik ingress 暴露的地址访问：53.2 ms

说明：

执行测试的 node 节点 / Pod 与 serivce 所在的 pod 的距离（是否在同一台主机上），对前两个场景可以能会有一定影响。

测试结果仅作参考，与具体的资源配置、网络环境等因素有关系。

3. 网络性能测试

网络使用 flannel 的 vxlan 模式，使用 iperf 进行测试。

服务端命令：

iperf3 -s -p 12345 -i 1

客户端命令：

iperf3 -c ${server-ip} -p 12345 -i 1 -t 10

3.1 node节点之间

shell 复制代码

# node1启动iperf服务端
[root@k8s-node1 ~]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345

# node启动iperf客户端测试
[root@k8s-node2 ~]# iperf3 -c 192.168.0.52 -p 12345 -i 1 -t 10
Connecting to host 192.168.0.52, port 12345
[  4] local 192.168.0.53 port 52106 connected to 192.168.0.52 port 12345
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   313 MBytes  2.62 Gbits/sec    0   1.38 MBytes
[  4]   1.00-2.00   sec   379 MBytes  3.18 Gbits/sec    5   1.36 MBytes
[  4]   2.00-3.00   sec   366 MBytes  3.06 Gbits/sec    0   1.47 MBytes
[  4]   3.00-4.00   sec   360 MBytes  3.02 Gbits/sec    0   1.57 MBytes
[  4]   4.00-5.00   sec   431 MBytes  3.62 Gbits/sec    0   1.65 MBytes
[  4]   5.00-6.00   sec   391 MBytes  3.27 Gbits/sec    0   1.71 MBytes
[  4]   6.00-7.00   sec   404 MBytes  3.39 Gbits/sec    0   1.76 MBytes
[  4]   7.00-8.00   sec   378 MBytes  3.18 Gbits/sec    0   1.78 MBytes
[  4]   8.00-9.00   sec   411 MBytes  3.44 Gbits/sec    0   1.80 MBytes
[  4]   9.00-10.00  sec   410 MBytes  3.43 Gbits/sec    0   1.81 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  3.75 GBytes  3.22 Gbits/sec    5             sender
[  4]   0.00-10.00  sec  3.75 GBytes  3.22 Gbits/sec                  receiver

3.2 不同node的 Pod 之间 ( flannel vxlan 模式)

shell 复制代码

# 部署两个pod，分布于两个node节点
[root@k8s-master ~]# cat perf-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: centos-perf
  labels:
    app: perf
spec:
  replicas: 2
  selector:
    matchLabels:
      app: perf
  template:
    metadata:
      labels:
        app: perf
    spec:
      containers:
      - name: perf
        image: centos79-perftools:20230425
        command: ["/bin/bash", "-c", "while true; do sleep 10000; done"]
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "2048Mi"
            cpu: "2000m"

[root@k8s-master ~]# kubectl apply -f perf-deploy.yaml
deployment.apps/centos-perf created
[root@k8s-master ~]# kubectl get pod -o wide
NAME               READY   STATUS    RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES
centos-perf-5b897965bc-cjqwt 1/1     Running   0          8m49s   10.244.2.148   k8s-node2    <none>           <none>
centos-perf-5b897965bc-vbqdg 1/1     Running   0          8m47s   10.244.1.137   k8s-node1    <none>           <none>
...

# 在node1的pod中启动iperf服务端
[root@k8s-master ~]# kubectl exec -it centos-perf-5b897965bc-vbqdg /bin/bash
[root@centos-perf-5b897965bc-vbqdg /]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345
-----------------------------------------------------------
Accepted connection from 10.244.2.148, port 33778
[  5] local 10.244.1.136 port 12345 connected to 10.244.2.148 port 33780
[ ID] Interval           Transfer     Bandwidth

# 在node2的pod中启动iperf客户端测试
[root@k8s-master ~]# kubectl exec -it centos-perf-5b897965bc-cjqwt /bin/bash
[root@centos-perf-5b897965bc-cjqwt /]# iperf3 -c 10.244.1.137 -p 12345 -i 1 -t 10
Connecting to host 10.244.1.136, port 12345
[  4] local 10.244.2.147 port 33780 connected to 10.244.1.137 port 12345
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   196 MBytes  1.64 Gbits/sec  741    584 KBytes
[  4]   1.00-2.00   sec   301 MBytes  2.53 Gbits/sec  2212    771 KBytes
[  4]   2.00-3.00   sec   199 MBytes  1.67 Gbits/sec  1147    912 KBytes
[  4]   3.00-4.00   sec   189 MBytes  1.59 Gbits/sec  387   1.01 MBytes
[  4]   4.00-5.00   sec   209 MBytes  1.75 Gbits/sec  138   1.14 MBytes
[  4]   5.00-6.00   sec   218 MBytes  1.83 Gbits/sec   92   1.26 MBytes
[  4]   6.00-7.00   sec   195 MBytes  1.64 Gbits/sec    0   1.36 MBytes
[  4]   7.00-8.00   sec   235 MBytes  1.97 Gbits/sec   33   1.46 MBytes
[  4]   8.00-9.00   sec   210 MBytes  1.76 Gbits/sec   46   1.55 MBytes
[  4]   9.00-10.01  sec   246 MBytes  2.06 Gbits/sec  171   1.65 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.01  sec  2.15 GBytes  1.84 Gbits/sec  4967             sender
[  4]   0.00-10.01  sec  2.14 GBytes  1.84 Gbits/sec                  receiver

3.3 Node 与不同node的 Pod 之间（flannel vxlan 模式）

shell 复制代码

# node1启动iperf服务端
[root@k8s-node1 ~]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345

# 在node2的pod中启动iperf客户端测试
[root@k8s-master ~]# kubectl get pod -o wide
NAME                        READY   STATUS    RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES
centos-perf-5b897965bc-cjqwt 1/1     Running   0          8m49s   10.244.2.148   k8s-node2    <none>           <none>
centos-perf-5b897965bc-vbqdg 1/1     Running   0          8m47s   10.244.1.137   k8s-node1    <none>           <none>
[root@k8s-master ~]# kubectl exec -it centos-perf-5b897965bc-cjqwt bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
[root@centos-perf-5b897965bc-cjqwt /]# iperf3 -c 192.168.0.52 -p 12345 -i 1 -t 10
Connecting to host 192.168.0.52, port 12345
[  4] local 10.244.2.148 port 52528 connected to 192.168.0.52 port 12345
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.02   sec   173 MBytes  1.43 Gbits/sec   51    515 KBytes
[  4]   1.02-2.01   sec   219 MBytes  1.84 Gbits/sec  330    603 KBytes
[  4]   2.01-3.01   sec   309 MBytes  2.60 Gbits/sec   47    875 KBytes
[  4]   3.01-4.00   sec   270 MBytes  2.28 Gbits/sec  249    838 KBytes
[  4]   4.00-5.00   sec   262 MBytes  2.20 Gbits/sec  140    997 KBytes
[  4]   5.00-6.00   sec   301 MBytes  2.53 Gbits/sec   60   1.11 MBytes
[  4]   6.00-7.00   sec   302 MBytes  2.54 Gbits/sec   64   1.23 MBytes
[  4]   7.00-8.00   sec   349 MBytes  2.92 Gbits/sec    0   1.33 MBytes
[  4]   8.00-9.00   sec   321 MBytes  2.70 Gbits/sec  159   1.42 MBytes
[  4]   9.00-10.00  sec   312 MBytes  2.62 Gbits/sec   19   1.50 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  2.75 GBytes  2.36 Gbits/sec  1119             sender
[  4]   0.00-10.00  sec  2.75 GBytes  2.36 Gbits/sec                  receiver

3.4 不同node的 Pod 之间（ flannel host-gw 模式）

修改flannel网络模式为host-gw：

shell 复制代码

# 修改flannel的backend为host-gw
[root@k8s-master ~]# kubectl get configmap -n kube-flannel kube-flannel-cfg -o yaml > kube-flannel-cfg.yaml
[root@k8s-master ~]# vim kube-flannel-cfg.yaml
...
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "EnableNFTables": false,
      "Backend": {
        "Type": "host-gw"		# 默认为vxlan，修改为host-gw
      }
    }
...

# 重启flannel-ds
[root@k8s-master kube-flannel]# kubectl get pod -n kube-flannel
NAME                    READY   STATUS    RESTARTS   AGE
kube-flannel-ds-jvff5   1/1     Running   13         13d
kube-flannel-ds-n5fqt   1/1     Running   13         13d
kube-flannel-ds-wwfmk   1/1     Running   14         13d
[root@k8s-master kube-flannel]# kubectl delete pod -n kube-flannel kube-flannel-ds-jvff5 kube-flannel-ds-n5fqt kube-flannel-ds-wwfmk
pod "kube-flannel-ds-jvff5" deleted
pod "kube-flannel-ds-n5fqt" deleted
pod "kube-flannel-ds-wwfmk" deleted
[root@k8s-master kube-flannel]# kubectl get pod -n kube-flannel
NAME                    READY   STATUS    RESTARTS   AGE
kube-flannel-ds-2p9gp   1/1     Running   0          13s
kube-flannel-ds-cn9x4   1/1     Running   0          23s
kube-flannel-ds-t7zjj   1/1     Running   0          18s

测试网络性能：

shell 复制代码

[root@k8s-master ~]# kubectl get pod -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES
centos-perf-5b897965bc-cjqwt   1/1     Running   0          35m   10.244.2.148   k8s-node2    <none>      <none>
centos-perf-5b897965bc-vbqdg   1/1     Running   0          35m   10.244.1.137   k8s-node1    <none>      <none>

# 在node1节点上进入pod，启动iperf服务端
[root@k8s-master ~]# kubectl exec -it centos-perf-5b897965bc-vbqdg /bin/bash
[root@centos-perf-5b897965bc-vbqdg /]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345
-----------------------------------------------------------
Accepted connection from 10.244.2.148, port 33778
[  5] local 10.244.1.136 port 12345 connected to 10.244.2.148 port 33780
[ ID] Interval           Transfer     Bandwidth

# 在node2节点上进入pod，启动iperf客户端连接node1节点的pod测试
[root@k8s-master ~]# kubectl exec -it centos-perf-5b897965bc-cjqwt /bin/bash
[root@centos-perf-5b897965bc-cjqwt /]# iperf3 -c 10.244.1.137 -p 12345 -i 1 -t 10
Connecting to host 10.244.1.137, port 12345
[  4] local 10.244.2.148 port 55200 connected to 10.244.1.137 port 12345
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   225 MBytes  1.88 Gbits/sec  1371    401 KBytes
[  4]   1.00-2.00   sec   242 MBytes  2.03 Gbits/sec  905    527 KBytes
[  4]   2.00-3.00   sec   244 MBytes  2.05 Gbits/sec  589    528 KBytes
[  4]   3.00-4.01   sec   292 MBytes  2.44 Gbits/sec  462    460 KBytes
[  4]   4.01-5.00   sec   242 MBytes  2.04 Gbits/sec  557    486 KBytes
[  4]   5.00-6.00   sec   287 MBytes  2.41 Gbits/sec  551    418 KBytes
[  4]   6.00-7.00   sec   314 MBytes  2.63 Gbits/sec  519    404 KBytes
[  4]   7.00-8.01   sec   313 MBytes  2.60 Gbits/sec  798    522 KBytes
[  4]   8.01-9.01   sec   297 MBytes  2.49 Gbits/sec  1902    478 KBytes
[  4]   9.01-10.00  sec   337 MBytes  2.86 Gbits/sec  1301    679 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  2.73 GBytes  2.34 Gbits/sec  8955             sender
[  4]   0.00-10.00  sec  2.73 GBytes  2.34 Gbits/sec                  receiver

3.5 Node 与不同node的 Pod 之间（ flannel host-gw 模式）

shell 复制代码

# # 在node2节点上进入pod，启动iperf客户端连接node1节点测试
[root@centos-perf-5b897965bc-cjqwt /]# iperf3 -c 192.168.0.52 -p 12345 -i 1 -t 10
Connecting to host 192.168.0.52, port 12345
[  4] local 10.244.2.148 port 53868 connected to 192.168.0.52 port 12345
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.01   sec   185 MBytes  1.54 Gbits/sec  171    453 KBytes
[  4]   1.01-2.00   sec   221 MBytes  1.86 Gbits/sec   84    728 KBytes
[  4]   2.00-3.00   sec   264 MBytes  2.21 Gbits/sec    0    927 KBytes
[  4]   3.00-4.00   sec   271 MBytes  2.28 Gbits/sec   33   1.06 MBytes
[  4]   4.00-5.00   sec   376 MBytes  3.16 Gbits/sec  368   1.22 MBytes
[  4]   5.00-6.00   sec   314 MBytes  2.63 Gbits/sec  138   1.35 MBytes
[  4]   6.00-7.00   sec   368 MBytes  3.08 Gbits/sec    0   1.49 MBytes
[  4]   7.00-8.00   sec   406 MBytes  3.41 Gbits/sec   65   1.57 MBytes
[  4]   8.00-9.00   sec   340 MBytes  2.85 Gbits/sec  355   1.63 MBytes
[  4]   9.00-10.00  sec   358 MBytes  3.00 Gbits/sec    0   1.68 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  3.03 GBytes  2.60 Gbits/sec  1214             sender
[  4]   0.00-10.00  sec  3.03 GBytes  2.60 Gbits/sec                  receiver

3.6 网络性能对比综述

场景	flannel网络模式	带宽（Gbits/sec）
不同node之间	不涉及	3.22
不同node的pod之间	vxlan	1.84
node与不同node的pod之间	vxlan	2.36
不同node的pod之间	host-gw	2.34
node与不同node的pod之间	host-gw	2.60

从上述数据得出结论：

Flannel 的 vxlan 模式网络性能（pod之间）相比宿主机直接互联的损耗 43%，基本符合网上流传的测试结论（损耗30%-40%）；
flannel 的 host-gw 模式网络性能（pod之间）相比宿主机互连的网络性能损耗大约是 27%；
vxlan 会有一个封包解包的过程，所以会对网络性能造成较大的损耗，而 host-gw 模式是直接使用路由信息，网络损耗小。