问题发现
使用Go写了一个http服务器,实现的功能是更改wlan0接口的ip,在测试的过程中,发现客户端发了请求收不到回复
go
curl -v --interface lo -X PUT -H "Content-Type: application/json" -d '{"mode":"Station","config":{
"SSID":"GL-5G","Password":"auto3d123"}}' http://127.0.0.1:8080/interfaces/wlan0/mode
> PUT /interfaces/wlan0/mode HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/8.6.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 67
>
但是服务端明明白白显示发送成功
服务端代码
go
func (s *Server) sendJSON(w http.ResponseWriter, statusCode int, data interface{}) {
var jsonData []byte
var err error
if data != nil {
jsonData, err = json.Marshal(data)
if err != nil {
s.log.WithError(err).Error("Failed to marshal JSON data for response")
// Fallback to a plain error response
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
w.WriteHeader(http.StatusInternalServerError)
w.Write([]byte("Internal Server Error: failed to create JSON response"))
return
}
}
w.Header().Set("Content-Type", "application/json")
if jsonData != nil {
w.Header().Set("Content-Length", strconv.Itoa(len(jsonData)))
} else {
w.Header().Set("Content-Length", "0")
}
w.WriteHeader(statusCode)
if jsonData != nil {
dataLen, err := w.Write(jsonData)
if err != nil {
s.log.WithError(err).Error("Failed to write JSON response body")
}
s.log.WithField("dataLen", dataLen).WithField("data", string(jsonData)).Info("Sent JSON response body")
}
// 强制刷新缓冲区
if f, ok := w.(http.Flusher); ok {
f.Flush()
}
s.log.Info("after flush")
}
日志能成功打印,说明服务端确实是发出去了,数据包在从网卡发出去的过程中出了问题
经过一番排查,找到了这样一些关键点
go
# ss -ant
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 0 127.0.0.1:8080 0.0.0.0:*
LISTEN 0 0 0.0.0.0:80 0.0.0.0:*
LISTEN 0 0 0.0.0.0:22 0.0.0.0:*
LISTEN 0 0 0.0.0.0:443 0.0.0.0:*
ESTAB 0 0 10.200.1.2:22 10.200.1.3:56048
ESTAB 0 0 10.200.1.2:22 10.200.1.3:44818
ESTAB 0 0 10.200.1.2:22 10.200.1.3:43606
TIME-WAIT 0 0 127.0.0.1:45148 127.0.0.1:8080
ESTAB 0 190 127.0.0.1:8080 192.168.49.1:45134
ESTAB 0 0 127.0.0.1:45134 127.0.0.1:8080
TIME-WAIT 0 0 127.0.0.1:34876 127.0.0.1:8080
LISTEN 0 0 *:22 *:*
LISTEN 0 0 *:7737 *:*
注意这两行
ESTAB 0 190 127.0.0.1:8080 192.168.49.1:45134
ESTAB 0 0 127.0.0.1:45134 127.0.0.1:8080
客户端向server的8080口发起请求,客户端的端口是45134,服务器在收到请求后回应的地址是192.168.49.1,问题就在这里,这个ip在发起请求后已经被更改了,所以server是发不出去的,可以看到Send-Q显示190,明显是没发出去,自然客户端也是收不到的,符合我们观察的现象。
所以,问题就变成了,我在127.0.0.1发出的包,源地址最终被改成了192.168.49.1,说明存在nat
查看本机防火墙规则
go
# iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 52411 packets, 4016K bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 222 packets, 35837 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 27940 packets, 2335K bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
27940 2335K MASQUERADE 0 -- * * 0.0.0.0/0 0.0.0.0/0
注意这一段
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
27940 2335K MASQUERADE 0 -- * * 0.0.0.0/0 0.0.0.0/0
也就是说,执行了
iptables -t nat -A POSTROUTING -j MASQUERADE
在 NAT 表的 POSTROUTING 链上添加一条 MASQUERADE 规则,用于实现源地址伪装(SNAT)
MASQUERADE的作用就是,当数据包要从一个网络接口发往另一个网络时,自动将该数据包的源 IP 地址修改为该出口接口的 IP 地址
也就是说,就是这行规则把我本地回环包的源地址改了,更直接的说,MASQUERADE认为这个包的出口地址是192.168.49.1,为什么会改呢,接下来详细分析
复现
go
iptables -t nat -A POSTROUTING -j MASQUERADE
nc -l 127.0.0.1 8080 &
nc 127.0.0.1 8080
ss -ant
ESTAB 0 0 127.0.0.1:8080 192.168.8.182:53190
ESTAB 0 0 127.0.0.1:53190 127.0.0.1:8080
或者
conntrack -L | grep 8080
conntrack v1.4.6 (conntrack-tools): 5 flow entries have been shown.
tcp 6 431936 ESTABLISHED src=127.0.0.1 dst=127.0.0.1 sport=53190 dport=8080 src=127.0.0.1 dst=192.168.8.182 sport=8080 dport=53190 [ASSURED] use=1
使用两种方式都可以观察到现象
MASQUERADE如何选择源IP
上面说了,MASQUERADE的作用就是:当数据包要从一个网络接口发往另一个网络时,自动将该数据包的源 IP 地址修改为该出口接口的 IP 地址
下面来用源码分析MQSQUERADE是怎么选择出192.168.49.1这个ip的
网卡scope
go
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 66:37:64:e0:78:c8 brd ff:ff:ff:ff:ff:ff
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 48:b0:2d:f9:04:9a brd ff:ff:ff:ff:ff:ff
6: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 9c:b8:b4:60:23:6e brd ff:ff:ff:ff:ff:ff
inet 192.168.49.1/24 brd 192.168.49.255 scope global wlan0
valid_lft forever preferred_lft forever
10: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 20:7b:d2:55:68:f7 brd ff:ff:ff:ff:ff:ff
inet 10.200.1.2/24 scope global eth0
valid_lft forever preferred_lft forever
inet 172.100.100.1/30 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::227b:d2ff:fe55:68f7/64 scope link
valid_lft forever preferred_lft forever
我们使用ip a 命令查看网卡的时候,注意到每个网卡后都有scope这个字段
scope定义了该网卡ip地址的有效范围
scope有这些类型
- global (全局): 表明这是一个全局唯一的IP地址,可以在任何地方进行路由。这意味着该网卡可以用来与外部网络(如互联网)通信。
- link (链路): 表示该IP地址仅在当前物理网络链路上有效,不能跨越路由器。IPv6的链路本地地址(fe80::开头)就是一个典型的例子。这种地址用于同一局域网内的设备间通信。
- host (主机): 说明该IP地址仅在本机内部有效,用于主机内部的进程间通信。最常见的例子就是回环地址127.0.0.1。
- site (站点,仅用于IPv6): 表示该IP地址仅在组织内部的多个局域网(站点)中有效,不能在公共互联网上路由。
在内核中有如下定义
go
include/uapi/linux/rtnetlink.h
enum rt_scope_t {
RT_SCOPE_UNIVERSE=0,
/* User defined values */
RT_SCOPE_SITE=200,
RT_SCOPE_LINK=253,
RT_SCOPE_HOST=254,
RT_SCOPE_NOWHERE=255
};
scope越小,表示的可达范围越大
MASQUERADE调用链分析
go
MASQUERADE 规则
↓
masquerade_tg_reg()
↓
masquerade_tg()
↓
nf_nat_masquerade_ipv4()
↓
newsrc = inet_select_addr(out, nh, RT_SCOPE_UNIVERSE); <-- 本文重点
↓
nf_nat_setup_info(..., &newrange, NF_NAT_MANIP_SRC);
首先MASQUERADE 在 NAT 表中注册的钩子函数
go
# net/netfilter/xt_MASQUERADE.c
static struct xt_target masquerade_tg_reg[] __read_mostly = {
{
#if IS_ENABLED(CONFIG_IPV6)
.name = "MASQUERADE",
.family = NFPROTO_IPV6,
.target = masquerade_tg6,
.targetsize = sizeof(struct nf_nat_range),
.table = "nat",
.hooks = 1 << NF_INET_POST_ROUTING,
.checkentry = masquerade_tg6_checkentry,
.destroy = masquerade_tg_destroy,
.me = THIS_MODULE,
}, {
#endif
.name = "MASQUERADE",
.family = NFPROTO_IPV4,
.target = masquerade_tg,
.targetsize = sizeof(struct nf_nat_ipv4_multi_range_compat),
.table = "nat",
.hooks = 1 << NF_INET_POST_ROUTING,
.checkentry = masquerade_tg_check,
.destroy = masquerade_tg_destroy,
.me = THIS_MODULE,
}
};
static unsigned int
masquerade_tg(struct sk_buff *skb, const struct xt_action_param *par)
{
struct nf_nat_range2 range;
const struct nf_nat_ipv4_multi_range_compat *mr;
mr = par->targinfo;
range.flags = mr->range[0].flags;
range.min_proto = mr->range[0].min;
range.max_proto = mr->range[0].max;
return nf_nat_masquerade_ipv4(skb, xt_hooknum(par), &range,
xt_out(par));
}
masquerade_tg就是MASQUERADE最终要执行的函数
nf_nat_masquerade_ipv4是masquerade_tg的核心,选IP的逻辑就在里面
go
//net/netfilter/nf_nat_masquerade.c
//skb(报文)、hooknum(必须是 POST_ROUTING)、range(iptables 给的端口范围)、out(出口网卡)
unsigned int
nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum,
const struct nf_nat_range2 *range,
const struct net_device *out)
{
struct nf_conn *ct;
struct nf_conn_nat *nat;
enum ip_conntrack_info ctinfo;
struct nf_nat_range2 newrange;
const struct rtable *rt;
__be32 newsrc, nh;
WARN_ON(hooknum != NF_INET_POST_ROUTING);
ct = nf_ct_get(skb, &ctinfo);
WARN_ON(!(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED ||
ctinfo == IP_CT_RELATED_REPLY)));
/* Source address is 0.0.0.0 - locally generated packet that is
* probably not supposed to be masqueraded.
*/
if (ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip == 0)
return NF_ACCEPT;
// 计算下一跳,这一步是为了让 inet_select_addr() 知道"对端在不在同一网段",
// 从而优先选同网段地址
rt = skb_rtable(skb);
nh = rt_nexthop(rt, ip_hdr(skb)->daddr);
//核心 动态地址选择
newsrc = inet_select_addr(out, nh, RT_SCOPE_UNIVERSE);
if (!newsrc) {
pr_info("%s ate my IP address\n", out->name);
return NF_DROP;
}
nat = nf_ct_nat_ext_add(ct);
if (nat)
nat->masq_index = out->ifindex;
/* Transfer from original range. */
memset(&newrange.min_addr, 0, sizeof(newrange.min_addr));
memset(&newrange.max_addr, 0, sizeof(newrange.max_addr));
newrange.flags = range->flags | NF_NAT_RANGE_MAP_IPS;
newrange.min_addr.ip = newsrc;
newrange.max_addr.ip = newsrc;
newrange.min_proto = range->min_proto;
newrange.max_proto = range->max_proto;
/* Hand modified range to generic setup. */
return nf_nat_setup_info(ct, &newrange, NF_NAT_MANIP_SRC);
}
继续看看选择ip的代码,结合我们的场景来分析
go
// net/ipv4/devinet.c
/**
* inet_select_addr - 为即将发出的报文挑一个合适的本机 IPv4 源地址
* @dev : 报文预计的出口网络设备(skb->dev)
* @dst : 报文的目的地址(网络字节序)
* @scope: 路由 scope,数值越小"可达范围"越大,只会选 scope ≤ 该值的地址
*
* 返回值:选中的本机地址;找不到返回 0
*/
__be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
{
/* 场景:dev = lo dst = 127.0.0.1 scope = RT_SCOPE_UNIVERSE(0) */
const struct in_ifaddr *ifa;
__be32 addr = 0;
unsigned char localnet_scope = RT_SCOPE_HOST;
struct in_device *in_dev;
struct net *net = dev_net(dev);
int master_idx;
rcu_read_lock();
/* ---------- 阶段 1:严格只在指定的 dev(这里是 lo)上找地址 ---------- */
in_dev = __in_dev_get_rcu(dev); /* 取 lo 的 in_device;失败则跳到兜底 */
if (!in_dev)
goto no_in_dev;
/* 如果 sysctl 打开 accept_local=1,localnet_scope 会降到 LINK(20),
* 但默认没开,所以还是 254 */
if (unlikely(IN_DEV_ROUTE_LOCALNET(in_dev)))
localnet_scope = RT_SCOPE_LINK;
/*
* 遍历 lo 上的所有 **primary** IPv4 地址
* 只有一个:127.0.0.1/8
*/
in_dev_for_each_ifa_rcu(ifa, in_dev) {
/* 跳过 secondary/alias 地址(没有就跳过)*/
if (READ_ONCE(ifa->ifa_flags) & IFA_F_SECONDARY)
continue;
/* 关键比较:min(ifa->ifa_scope, localnet_scope) > scope
* 127.0.0.1 的 ifa_scope = HOST(254)
* localnet_scope 也是 HOST(254)
* min(254,254)=254 > 0 ⇒ 条件成立,continue 掉 */
if (min(ifa->ifa_scope, localnet_scope) > scope)
continue;
if (!dst || inet_ifa_match(dst, ifa)) {
addr = ifa->ifa_local;
break;
}
if (!addr)
addr = ifa->ifa_local;
}
/* 阶段 1 结束:addr 仍然是 0(lo 上 127.0.0.1 被 scope 过滤掉) */
if (addr)
goto out_unlock;
no_in_dev:
/* ---------- 阶段 2:在 VRF master 设备上再试一次 ---------- */
master_idx = l3mdev_master_ifindex_rcu(dev);
/* For VRFs, the VRF device takes the place of the loopback device,
* with addresses on it being preferred. Note in such cases the
* loopback device will be among the devices that fail the master_idx
* equality check in the loop below.
*/
if (master_idx &&
(dev = dev_get_by_index_rcu(net, master_idx)) &&
(in_dev = __in_dev_get_rcu(dev))) {
addr = in_dev_select_addr(in_dev, scope);
if (addr)
goto out_unlock;
}
/* Not loopback addresses on loopback should be preferred
in this case. It is important that lo is the first interface
in dev_base list.
*/
/* ---------- 阶段 3:兜底------遍历整个 net namespace 的所有网卡 ---------- */
for_each_netdev_rcu(net, dev) {
if (l3mdev_master_ifindex_rcu(dev) != master_idx)
continue;
in_dev = __in_dev_get_rcu(dev);
if (!in_dev)
continue;
/* 在每个网卡上再找一次满足 scope 的地址 */
addr = in_dev_select_addr(in_dev, scope);
/* 扫到 wlan0 时,192.168.49.1/24 的 scope=UNIVERSE(0) ≤ 0 ⇒ 命中 */
if (addr)
goto out_unlock;
}
out_unlock:
rcu_read_unlock();
return addr;
}
EXPORT_SYMBOL(inet_select_addr);
go
static __be32 in_dev_select_addr(const struct in_device *in_dev,
int scope)
{
const struct in_ifaddr *ifa;
in_dev_for_each_ifa_rcu(ifa, in_dev) {
if (READ_ONCE(ifa->ifa_flags) & IFA_F_SECONDARY)
continue;
if (ifa->ifa_scope != RT_SCOPE_LINK &&
ifa->ifa_scope <= scope)
return ifa->ifa_local;
}
return 0;
}
所以在之前的场景下,MASQUERADE会吧源地址该成192.168.49.1
注:内核源码版本为6.10.0
相关阅读: