在Docker Swarm中,有两种方式可以实现内部的负载均衡:Service VIP和Routing Mesh。
- Service VIP(Virtual IP):Service VIP是一种基于VIP的负载均衡方式,它为每个服务分配一个虚拟IP地址。当请求到达Service VIP时,Docker Swarm会将请求转发给运行该服务的节点上的一个任务。这种方式适用于每个服务只暴露一个端口的情况。
- Routing Mesh:Routing Mesh是一种内置的负载均衡和服务发现机制。它通过在Swarm集群中的每个节点上创建一个代理来实现负载均衡。代理会自动将请求路由到运行相应服务的节点上的任务。Routing Mesh支持多端口和多服务的负载均衡,并提供了服务发现功能。
主机规划:
- node1:172.19.177.14,角色为Leader
- node2:172.19.188.123,角色为Worker
shell
$ sudo docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
r0tfqjhih73qyy43ja34501mf * node1 Ready Active Leader 24.0.2
8w00ahq3ltzcmfejx7j6d755p node2 Ready Active 24.0.2
创建overlay网络
在node1上创建一个名为mynet的overlay网络:
shell
$ sudo docker network create -d overlay mynet
obrrvzltlp46ydu4zqq56md6a
只有自定义的网络才支持将service name解析为vip(类似于DNS服务)。
service创建
创建一个service,名为web, 通过-p把端口映射出来:
shell
$ sudo docker service create --name web --network mynet -p 8080:80 --replicas 2 containous/whoami
128itvcw5ejqv832g9hw791a6
overall progress: 2 out of 2 tasks
1/2: running [==================================================>]
2/2: running [==================================================>]
verify: Service converged
我们使用的镜像containous/whoami
是一个简单的web服务,能返回服务器的hostname,和基本的网络信息,比如IP地址。
查询service:
shell
$ sudo docker service ps web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
dvihg8lgiw03 web.1 containous/whoami:latest node2 Running Running 23 seconds ago
fb45l99kx767 web.2 containous/whoami:latest node1 Running Running 23 seconds ago
创建一个client
创建一个client,用来访问service:
shell
$ sudo docker service create --name client --network mynet centos:7 ping 8.8.8.8
ohgc2y9t9ztb086j1e5n777b8
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
查看client服务:
shell
~$ sudo docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
ohgc2y9t9ztb client replicated 1/1 centos:7
128itvcw5ejq web replicated 2/2 containous/whoami:latest *:8080->80/tcp
$ sudo docker service ps client
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ebrpzyvsz7jm client.1 centos:7 node1 Running Running 16 seconds ago
在client中访问service
查询client的容器ID为d200af28b981
:
shell
$ sudo docker container ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d200af28b981 centos:7 "ping 8.8.8.8" About a minute ago Up About a minute client.1.ebrpzyvsz7jmszwd5dmnihfyw
694fd2d1f0a4 containous/whoami:latest "/whoami" About a minute ago Up About a minute 80/tcp web.2.fb45l99kx7674pvzd44ow3ubf
在client中访问service:
shell
$ sudo docker container exec -it d200af28b981 curl web
Hostname: 0914dead5a9c
IP: 127.0.0.1
IP: 10.0.1.3
IP: 172.20.0.3
IP: 10.0.0.5
RemoteAddr: 10.0.1.5:47794
GET / HTTP/1.1
Host: web
User-Agent: curl/7.29.0
Accept: */*
$ sudo docker container exec -it d200af28b981 curl web
Hostname: 694fd2d1f0a4
IP: 127.0.0.1
IP: 10.0.0.6
IP: 172.18.0.3
IP: 10.0.1.4
RemoteAddr: 10.0.1.5:36370
GET / HTTP/1.1
Host: web
User-Agent: curl/7.29.0
Accept: */*
可以发现两个web容器的IP分别为10.0.1.3
和10.0.1.4
。
在client容器中去ping web这个service name:
shell
$ sudo docker container exec -it d200af28b981 ping web -c 3
PING web (10.0.1.2) 56(84) bytes of data.
64 bytes from 10.0.1.2 (10.0.1.2): icmp_seq=1 ttl=64 time=0.032 ms
64 bytes from 10.0.1.2 (10.0.1.2): icmp_seq=2 ttl=64 time=0.047 ms
64 bytes from 10.0.1.2 (10.0.1.2): icmp_seq=3 ttl=64 time=0.062 ms
--- web ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2022ms
rtt min/avg/max/mdev = 0.032/0.047/0.062/0.012 ms
通过ping获取到的IP是10.0.1.2
,而不是web容器的实际IP,那么这个IP是谁的IP呢?
查看mynet的网络详情
首先10.0.1.0/24
这个网段是mynet网络的,所以这个地址肯定是连在了mynet上。
所以我们来查看一下mynet的网络详情:
shell
$ sudo docker network inspect mynet
[
{
"Name": "mynet",
"Id": "obrrvzltlp46ydu4zqq56md6a",
"Created": "2023-11-29T11:02:29.924112625Z",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.1.0/24",
"Gateway": "10.0.1.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"694fd2d1f0a471630d755743a39ae28ecf4c38d20a30b45f145d55110d038882": {
"Name": "web.2.fb45l99kx7674pvzd44ow3ubf",
"EndpointID": "60ffd2a92a9b8e6603668c1922774d572512c88ecc1899e8bb2a6890a0569998",
"MacAddress": "02:42:0a:00:01:04",
"IPv4Address": "10.0.1.4/24",
"IPv6Address": ""
},
"d200af28b98158a90a2ba110477e6cff17170b059c89f07b389f8a40a9debca6": {
"Name": "client.1.ebrpzyvsz7jmszwd5dmnihfyw",
"EndpointID": "05cb7d2aecfb35c5aa38a0713cb3c989290f9dcdb6db819bc3e6bdd57731ae5d",
"MacAddress": "02:42:0a:00:01:08",
"IPv4Address": "10.0.1.8/24",
"IPv6Address": ""
},
"lb-mynet": {
"Name": "mynet-endpoint",
"EndpointID": "7bf814a43e5c733914694cddeaed0ec91ff2522b430a2e75567b0f5d5e7a09bc",
"MacAddress": "02:42:0a:00:01:05",
"IPv4Address": "10.0.1.5/24",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4097"
},
"Labels": {},
"Peers": [
{
"Name": "de473f8d81e5",
"IP": "172.19.177.14"
},
{
"Name": "138b78fdca53",
"IP": "172.19.188.123"
}
]
}
]
发现有一个叫lb-mynet的容器连在了mynet上。
lb-mynet网络命名空间详情
这个lb-mynet其实并不是一个容器,而是一个网络命名空间(network namespace), 我们可以通过下面的方式进入到这个命名空间:
先查看mynet网络的ID为obrrvzltlp46
:
shell
$ sudo docker network ls | grep mynet
obrrvzltlp46 mynet overlay swarm
再在/run/docker/netns/
目录下寻找lb_
开头+obrrvzltlp46
前缀的文件:
shell
$ sudo ls /run/docker/netns/
05bd5279bd2a 1-lly92pzjok 1-obrrvzltlp c035f61daf3b ingress_sbox lb_obrrvzltl
找到lb-mynet
对应的命名空间文件为lb_obrrvzltl
。
通过nsenter命令进入到lb_obrrvzltl
这个命名空间里查看IP地址:
shell
$ sudo nsenter --net="/run/docker/netns/lb_obrrvzltl" ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
157: eth0@if158: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether 02:42:0a:00:01:05 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.1.5/24 brd 10.0.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet 10.0.1.2/32 scope global eth0
valid_lft forever preferred_lft forever
inet 10.0.1.7/32 scope global eth0
valid_lft forever preferred_lft forever
可以看到网卡eth0的IP中有一个IP地址为10.0.1.2
,这个IP称之为虚拟IP(VIP)。
lb-mynet网络命名空间对流量的处理
当我们在client容器中根据service name访问web这个service时,Docker Swarm内置的DNS解析器会将web这个service name解析为10.0.1.2
这个IP,也就是流量会进入到lb-mynet这个网络命名空间内,下面来看下lb-mynet网络命名空间会流量的处理。
和ingress网络一样,查看iptables的规则:
shell
$ sudo nsenter --net="/run/docker/netns/lb_obrrvzltl" iptables -nvL -t mangle
Chain PREROUTING (policy ACCEPT 43 packets, 3771 bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 31 packets, 2253 bytes)
pkts bytes target prot opt in out source destination
31 2253 MARK all -- * * 0.0.0.0/0 10.0.1.2 MARK set 0x111
0 0 MARK all -- * * 0.0.0.0/0 10.0.1.7 MARK set 0x112
Chain FORWARD (policy ACCEPT 12 packets, 1518 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 34 packets, 2517 bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 43 packets, 3771 bytes)
pkts bytes target prot opt in out source destination
可以看到目标IP为10.0.1.2
的流量会被MARK为0x111,也就是十进制的273。
再来看ipvd的负载均衡规则:
shell
$ sudo nsenter --net="/run/docker/netns/lb_obrrvzltl" ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM 273 rr
-> 10.0.1.3:0 Masq 1 0 0
-> 10.0.1.4:0 Masq 1 0 0
FWM 274 rr
-> 10.0.1.8:0 Masq 1 0 0
ipvs会将MARK为273的流量轮询(RR)转发给10.0.1.3和10.0.1.4,从而实现负载均衡。