Kubernetes 网络性能测试
基于已经搭建的Kubernetes环境,来测试其网络性能。
1. 测试准备
1.1 测试环境
测试环境为VMware Workstation虚拟机搭建的一套K8S环境,版本为1.19,网络插件使用flannel。
hostname | ip | 备注 |
---|---|---|
k8s-master | 192.168.0.51 | master |
k8s-node1 | 192.168.0.52 | worker |
k8s-node2 | 192.168.0.53 | worker |
已经部署测试应用sample-webapp,三个副本:
shell
[root@k8s-master ~]# kubectl get pod -n ingress-traefik | grep sample-webapp
sample-webapp-4wf7c 1/1 Running 0 46s
sample-webapp-jvdpv 1/1 Running 10 7d23h
sample-webapp-kdk9k 1/1 Running 0 46s
[root@k8s-master ~]# kubectl get svc -n ingress-traefik | grep sample-webapp
sample-webapp ClusterIP 10.98.210.117 <none> 8000/TCP 10d
备注:
本文测试环境,master节点取消了NoSchedule的Taints,使得master节点也可以调度业务pod。
1.2 测试场景
- Kubernetes 集群 node 节点上通过 Cluster IP 方式访问
- Kubernetes 集群内部通过 service 访问
- Kubernetes 集群外部通过 traefik ingress 暴露的地址访问
1.3 测试工具
- Locust:一个简单易用的用户负载测试工具,用来测试 web 或其他系统能够同时处理的并发用户数。
- curl
- 测试程序:sample-webapp,源码见Github kubernetes 的分布式负载测试。
1.4 测试说明
通过向 sample-webapp 发送 curl 请求获取响应时间,直接 curl 后的结果为:
shell
[root@k8s-master ~]# curl "http://10.98.210.117:8000"
Welcome to the "Distributed Load Testing Using Kubernetes" sample web app
2. 网络延迟测试
2.1 Kubernetes 集群 node 节点上通过 Cluster IP 访问
测试命令
curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}' "http://10.98.210.117:8000/"
测试10组数据,取平均结果:
shell
[root@k8s-node1 ~]# echo "time_connect time_starttransfer time_total"; for i in {1..10}; do curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}\n' "http://10.98.210.117:8000/"; done
time_connect time_starttransfer time_total
0.000 0.002 0.002
0.000 0.001 0.001
0.001 0.003 0.003
0.001 0.003 0.003
0.001 0.006 0.006
0.001 0.003 0.003
0.001 0.004 0.004
0.001 0.003 0.003
0.001 0.004 0.004
0.000 0.002 0.002
平均响应时间:3.1 ms
指标说明:
-
time_connect:建立到服务器的 TCP 连接所用的时间
-
time_starttransfer:在发出请求之后,Web 服务器返回数据的第一个字节所用的时间
-
time_total:完成请求所用的时间
2.2 Kubernetes 集群内部通过 service 访问
shell
# 进入测试的客户端
[root@k8s-node1 ~]# kubectl exec -it locust-master-vljrx -n ingress-traefik /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
# 执行测试
root@locust-master-vljrx:/# echo "time_connect time_starttransfer time_total"; for i in {1..10}; do curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}\n' "http://sample-webapp:8000/"; done
time_connect time_starttransfer time_total
0.001326 0.002542 0.002715
0.001444 0.003264 0.003675
0.002193 0.004262 0.004889
0.002066 0.003664 0.003876
0.001739 0.004095 0.004432
0.002339 0.004536 0.004647
0.001649 0.003288 0.003628
0.001794 0.003373 0.003911
0.001492 0.003201 0.003581
0.002036 0.003712 0.004109
平均响应时间:约4ms
2.3 在外部通过 traefik ingress 访问
本文使用了traefik作为k8s集群的网关入口,可以参考我的另外一篇文章部署traefik2.0。也可以将service改为NodePort类型进行测试。
shell
# 添加路由规则,使得访问sample-webapp.test.com可以路由到sample-webapp服务
[root@k8s-master ~]# kubectl get ingressroute -n ingress-traefik traefik-dashboard-route -o yaml
...
resourceVersion: "975669"
selfLink: /apis/traefik.containo.us/v1alpha1/namespaces/ingress-traefik/ingressroutes/traefik-dashboard-route
uid: 4faeecab-cd87-406f-9d50-3d507a6b73ff
spec:
entryPoints:
- web
routes:
- kind: Rule
match: Host(`traefik.test.com`)
services:
- name: traefik
port: 8080
- kind: Rule
match: Host(`locust.test.com`)
services:
- name: locust-master
port: 8089
- kind: Rule
match: Host(`sample-webapp.test.com`)
services:
- name: sample-webapp
port: 8000
在外部的客户端添加域名解析后,进行访问测试:
shell
[root@bogon ~]# echo "time_connect time_starttransfer time_total"; for i in {1..10}; do curl -o /dev/null -s -w '%{time_connect} %{time_starttransfer} %{time_total}\n' "http://sample-webapp.test.com/"; done
time_connect time_starttransfer time_total
0.048 0.056 0.062
0.030 0.108 0.115
0.029 0.036 0.046
0.048 0.111 0.119
0.021 0.030 0.030
0.017 0.022 0.022
0.025 0.031 0.036
0.021 0.028 0.028
0.039 0.045 0.045
0.020 0.025 0.029
平均响应时间:53.2ms
2.4 测试结果
在这三种场景下的响应时间测试结果如下:
Kubernetes 集群 node 节点上通过 Cluster IP 方式访问:3.1 ms
Kubernetes 集群内部通过 service 访问:4 ms
Kubernetes 集群外部通过 traefik ingress 暴露的地址访问:53.2 ms
说明:
- 执行测试的 node 节点 / Pod 与 serivce 所在的 pod 的距离(是否在同一台主机上),对前两个场景可以能会有一定影响。
- 测试结果仅作参考,与具体的资源配置、网络环境等因素有关系。
3. 网络性能测试
网络使用 flannel 的 vxlan 模式,使用 iperf 进行测试。
服务端命令:
iperf3 -s -p 12345 -i 1
客户端命令:
iperf3 -c ${server-ip} -p 12345 -i 1 -t 10
3.1 node节点之间
shell
# node1启动iperf服务端
[root@k8s-node1 ~]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345
# node启动iperf客户端测试
[root@k8s-node2 ~]# iperf3 -c 192.168.0.52 -p 12345 -i 1 -t 10
Connecting to host 192.168.0.52, port 12345
[ 4] local 192.168.0.53 port 52106 connected to 192.168.0.52 port 12345
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 313 MBytes 2.62 Gbits/sec 0 1.38 MBytes
[ 4] 1.00-2.00 sec 379 MBytes 3.18 Gbits/sec 5 1.36 MBytes
[ 4] 2.00-3.00 sec 366 MBytes 3.06 Gbits/sec 0 1.47 MBytes
[ 4] 3.00-4.00 sec 360 MBytes 3.02 Gbits/sec 0 1.57 MBytes
[ 4] 4.00-5.00 sec 431 MBytes 3.62 Gbits/sec 0 1.65 MBytes
[ 4] 5.00-6.00 sec 391 MBytes 3.27 Gbits/sec 0 1.71 MBytes
[ 4] 6.00-7.00 sec 404 MBytes 3.39 Gbits/sec 0 1.76 MBytes
[ 4] 7.00-8.00 sec 378 MBytes 3.18 Gbits/sec 0 1.78 MBytes
[ 4] 8.00-9.00 sec 411 MBytes 3.44 Gbits/sec 0 1.80 MBytes
[ 4] 9.00-10.00 sec 410 MBytes 3.43 Gbits/sec 0 1.81 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 3.75 GBytes 3.22 Gbits/sec 5 sender
[ 4] 0.00-10.00 sec 3.75 GBytes 3.22 Gbits/sec receiver
3.2 不同node的 Pod 之间 ( flannel vxlan 模式)
shell
# 部署两个pod,分布于两个node节点
[root@k8s-master ~]# cat perf-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: centos-perf
labels:
app: perf
spec:
replicas: 2
selector:
matchLabels:
app: perf
template:
metadata:
labels:
app: perf
spec:
containers:
- name: perf
image: centos79-perftools:20230425
command: ["/bin/bash", "-c", "while true; do sleep 10000; done"]
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "2048Mi"
cpu: "2000m"
[root@k8s-master ~]# kubectl apply -f perf-deploy.yaml
deployment.apps/centos-perf created
[root@k8s-master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
centos-perf-5b897965bc-cjqwt 1/1 Running 0 8m49s 10.244.2.148 k8s-node2 <none> <none>
centos-perf-5b897965bc-vbqdg 1/1 Running 0 8m47s 10.244.1.137 k8s-node1 <none> <none>
...
# 在node1的pod中启动iperf服务端
[root@k8s-master ~]# kubectl exec -it centos-perf-5b897965bc-vbqdg /bin/bash
[root@centos-perf-5b897965bc-vbqdg /]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345
-----------------------------------------------------------
Accepted connection from 10.244.2.148, port 33778
[ 5] local 10.244.1.136 port 12345 connected to 10.244.2.148 port 33780
[ ID] Interval Transfer Bandwidth
# 在node2的pod中启动iperf客户端测试
[root@k8s-master ~]# kubectl exec -it centos-perf-5b897965bc-cjqwt /bin/bash
[root@centos-perf-5b897965bc-cjqwt /]# iperf3 -c 10.244.1.137 -p 12345 -i 1 -t 10
Connecting to host 10.244.1.136, port 12345
[ 4] local 10.244.2.147 port 33780 connected to 10.244.1.137 port 12345
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 196 MBytes 1.64 Gbits/sec 741 584 KBytes
[ 4] 1.00-2.00 sec 301 MBytes 2.53 Gbits/sec 2212 771 KBytes
[ 4] 2.00-3.00 sec 199 MBytes 1.67 Gbits/sec 1147 912 KBytes
[ 4] 3.00-4.00 sec 189 MBytes 1.59 Gbits/sec 387 1.01 MBytes
[ 4] 4.00-5.00 sec 209 MBytes 1.75 Gbits/sec 138 1.14 MBytes
[ 4] 5.00-6.00 sec 218 MBytes 1.83 Gbits/sec 92 1.26 MBytes
[ 4] 6.00-7.00 sec 195 MBytes 1.64 Gbits/sec 0 1.36 MBytes
[ 4] 7.00-8.00 sec 235 MBytes 1.97 Gbits/sec 33 1.46 MBytes
[ 4] 8.00-9.00 sec 210 MBytes 1.76 Gbits/sec 46 1.55 MBytes
[ 4] 9.00-10.01 sec 246 MBytes 2.06 Gbits/sec 171 1.65 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.01 sec 2.15 GBytes 1.84 Gbits/sec 4967 sender
[ 4] 0.00-10.01 sec 2.14 GBytes 1.84 Gbits/sec receiver
3.3 Node 与不同node的 Pod 之间(flannel vxlan 模式)
shell
# node1启动iperf服务端
[root@k8s-node1 ~]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345
# 在node2的pod中启动iperf客户端测试
[root@k8s-master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
centos-perf-5b897965bc-cjqwt 1/1 Running 0 8m49s 10.244.2.148 k8s-node2 <none> <none>
centos-perf-5b897965bc-vbqdg 1/1 Running 0 8m47s 10.244.1.137 k8s-node1 <none> <none>
[root@k8s-master ~]# kubectl exec -it centos-perf-5b897965bc-cjqwt bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
[root@centos-perf-5b897965bc-cjqwt /]# iperf3 -c 192.168.0.52 -p 12345 -i 1 -t 10
Connecting to host 192.168.0.52, port 12345
[ 4] local 10.244.2.148 port 52528 connected to 192.168.0.52 port 12345
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.02 sec 173 MBytes 1.43 Gbits/sec 51 515 KBytes
[ 4] 1.02-2.01 sec 219 MBytes 1.84 Gbits/sec 330 603 KBytes
[ 4] 2.01-3.01 sec 309 MBytes 2.60 Gbits/sec 47 875 KBytes
[ 4] 3.01-4.00 sec 270 MBytes 2.28 Gbits/sec 249 838 KBytes
[ 4] 4.00-5.00 sec 262 MBytes 2.20 Gbits/sec 140 997 KBytes
[ 4] 5.00-6.00 sec 301 MBytes 2.53 Gbits/sec 60 1.11 MBytes
[ 4] 6.00-7.00 sec 302 MBytes 2.54 Gbits/sec 64 1.23 MBytes
[ 4] 7.00-8.00 sec 349 MBytes 2.92 Gbits/sec 0 1.33 MBytes
[ 4] 8.00-9.00 sec 321 MBytes 2.70 Gbits/sec 159 1.42 MBytes
[ 4] 9.00-10.00 sec 312 MBytes 2.62 Gbits/sec 19 1.50 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 2.75 GBytes 2.36 Gbits/sec 1119 sender
[ 4] 0.00-10.00 sec 2.75 GBytes 2.36 Gbits/sec receiver
3.4 不同node的 Pod 之间( flannel host-gw 模式)
修改flannel网络模式为host-gw:
shell
# 修改flannel的backend为host-gw
[root@k8s-master ~]# kubectl get configmap -n kube-flannel kube-flannel-cfg -o yaml > kube-flannel-cfg.yaml
[root@k8s-master ~]# vim kube-flannel-cfg.yaml
...
net-conf.json: |
{
"Network": "10.244.0.0/16",
"EnableNFTables": false,
"Backend": {
"Type": "host-gw" # 默认为vxlan,修改为host-gw
}
}
...
# 重启flannel-ds
[root@k8s-master kube-flannel]# kubectl get pod -n kube-flannel
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-jvff5 1/1 Running 13 13d
kube-flannel-ds-n5fqt 1/1 Running 13 13d
kube-flannel-ds-wwfmk 1/1 Running 14 13d
[root@k8s-master kube-flannel]# kubectl delete pod -n kube-flannel kube-flannel-ds-jvff5 kube-flannel-ds-n5fqt kube-flannel-ds-wwfmk
pod "kube-flannel-ds-jvff5" deleted
pod "kube-flannel-ds-n5fqt" deleted
pod "kube-flannel-ds-wwfmk" deleted
[root@k8s-master kube-flannel]# kubectl get pod -n kube-flannel
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-2p9gp 1/1 Running 0 13s
kube-flannel-ds-cn9x4 1/1 Running 0 23s
kube-flannel-ds-t7zjj 1/1 Running 0 18s
测试网络性能:
shell
[root@k8s-master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
centos-perf-5b897965bc-cjqwt 1/1 Running 0 35m 10.244.2.148 k8s-node2 <none> <none>
centos-perf-5b897965bc-vbqdg 1/1 Running 0 35m 10.244.1.137 k8s-node1 <none> <none>
# 在node1节点上进入pod,启动iperf服务端
[root@k8s-master ~]# kubectl exec -it centos-perf-5b897965bc-vbqdg /bin/bash
[root@centos-perf-5b897965bc-vbqdg /]# iperf3 -s -p 12345 -i 1
-----------------------------------------------------------
Server listening on 12345
-----------------------------------------------------------
Accepted connection from 10.244.2.148, port 33778
[ 5] local 10.244.1.136 port 12345 connected to 10.244.2.148 port 33780
[ ID] Interval Transfer Bandwidth
# 在node2节点上进入pod,启动iperf客户端连接node1节点的pod测试
[root@k8s-master ~]# kubectl exec -it centos-perf-5b897965bc-cjqwt /bin/bash
[root@centos-perf-5b897965bc-cjqwt /]# iperf3 -c 10.244.1.137 -p 12345 -i 1 -t 10
Connecting to host 10.244.1.137, port 12345
[ 4] local 10.244.2.148 port 55200 connected to 10.244.1.137 port 12345
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 225 MBytes 1.88 Gbits/sec 1371 401 KBytes
[ 4] 1.00-2.00 sec 242 MBytes 2.03 Gbits/sec 905 527 KBytes
[ 4] 2.00-3.00 sec 244 MBytes 2.05 Gbits/sec 589 528 KBytes
[ 4] 3.00-4.01 sec 292 MBytes 2.44 Gbits/sec 462 460 KBytes
[ 4] 4.01-5.00 sec 242 MBytes 2.04 Gbits/sec 557 486 KBytes
[ 4] 5.00-6.00 sec 287 MBytes 2.41 Gbits/sec 551 418 KBytes
[ 4] 6.00-7.00 sec 314 MBytes 2.63 Gbits/sec 519 404 KBytes
[ 4] 7.00-8.01 sec 313 MBytes 2.60 Gbits/sec 798 522 KBytes
[ 4] 8.01-9.01 sec 297 MBytes 2.49 Gbits/sec 1902 478 KBytes
[ 4] 9.01-10.00 sec 337 MBytes 2.86 Gbits/sec 1301 679 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 2.73 GBytes 2.34 Gbits/sec 8955 sender
[ 4] 0.00-10.00 sec 2.73 GBytes 2.34 Gbits/sec receiver
3.5 Node 与不同node的 Pod 之间( flannel host-gw 模式)
shell
# # 在node2节点上进入pod,启动iperf客户端连接node1节点测试
[root@centos-perf-5b897965bc-cjqwt /]# iperf3 -c 192.168.0.52 -p 12345 -i 1 -t 10
Connecting to host 192.168.0.52, port 12345
[ 4] local 10.244.2.148 port 53868 connected to 192.168.0.52 port 12345
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.01 sec 185 MBytes 1.54 Gbits/sec 171 453 KBytes
[ 4] 1.01-2.00 sec 221 MBytes 1.86 Gbits/sec 84 728 KBytes
[ 4] 2.00-3.00 sec 264 MBytes 2.21 Gbits/sec 0 927 KBytes
[ 4] 3.00-4.00 sec 271 MBytes 2.28 Gbits/sec 33 1.06 MBytes
[ 4] 4.00-5.00 sec 376 MBytes 3.16 Gbits/sec 368 1.22 MBytes
[ 4] 5.00-6.00 sec 314 MBytes 2.63 Gbits/sec 138 1.35 MBytes
[ 4] 6.00-7.00 sec 368 MBytes 3.08 Gbits/sec 0 1.49 MBytes
[ 4] 7.00-8.00 sec 406 MBytes 3.41 Gbits/sec 65 1.57 MBytes
[ 4] 8.00-9.00 sec 340 MBytes 2.85 Gbits/sec 355 1.63 MBytes
[ 4] 9.00-10.00 sec 358 MBytes 3.00 Gbits/sec 0 1.68 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 3.03 GBytes 2.60 Gbits/sec 1214 sender
[ 4] 0.00-10.00 sec 3.03 GBytes 2.60 Gbits/sec receiver
3.6 网络性能对比综述
场景 | flannel网络模式 | 带宽(Gbits/sec) |
---|---|---|
不同node之间 | 不涉及 | 3.22 |
不同node的pod之间 | vxlan | 1.84 |
node与不同node的pod之间 | vxlan | 2.36 |
不同node的pod之间 | host-gw | 2.34 |
node与不同node的pod之间 | host-gw | 2.60 |
从上述数据得出结论:
- Flannel 的 vxlan 模式网络性能(pod之间)相比宿主机直接互联的损耗 43%,基本符合网上流传的测试结论(损耗30%-40%);
- flannel 的 host-gw 模式网络性能(pod之间)相比宿主机互连的网络性能损耗大约是 27%;
- vxlan 会有一个封包解包的过程,所以会对网络性能造成较大的损耗,而 host-gw 模式是直接使用路由信息,网络损耗小。