前言
来自于测试中无意发现到的一个接收窗口满的案例,特殊,或者可以说我以前都没在实际场景中见过。一开始都没整太明白,花了些精力才算是弄清楚了些,记录分享下。
问题说明
在研究拥塞控制的慢启动阶段时,通过 packetdrill 写了一个脚本,如下。
plain
# cat tcp_troubleshooting_1_001.pkt
`ethtool -K tun0 tso off
ethtool -K tun0 gso off`
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
+0.01 < P. 1:101(100) ack 1 win 2000
+0 > . 1:1(0) ack 101
+0.01 write(4,...,4000) = 4000
+0 `sleep 10`
#
TCP 三次握手中通告的 MSS 为 1000,也没有启用 WScale 因子,之后客户端发送了一个数据段,其中通告了自身的 Win 为 2000,服务器也正常响应了 ACK,之后服务器端写入了 4000 字节大小的数据包。
想象中的实验结果应该是,由于接收端 Win 2000 限制的缘故,服务器端只能发出 2 个 1000 字节的数据包,可惜想象很美好,结果却不一致,服务器发出了 4 个 1000 字节的数据包,无视接收窗口的限制,问题在哪?
plain
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:00:20.517879 tun0 In IP 192.0.2.1.45917 > 192.168.235.30.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
22:00:20.517912 tun0 Out IP 192.168.235.30.8080 > 192.0.2.1.45917: Flags [S.], seq 2020381752, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
22:00:20.528033 tun0 In IP 192.0.2.1.45917 > 192.168.235.30.8080: Flags [.], ack 1, win 10000, length 0
22:00:20.538114 tun0 In IP 192.0.2.1.45917 > 192.168.235.30.8080: Flags [P.], seq 1:101, ack 1, win 2000, length 100: HTTP
22:00:20.538139 tun0 Out IP 192.168.235.30.8080 > 192.0.2.1.45917: Flags [.], ack 101, win 64140, length 0
22:00:20.548217 tun0 Out IP 192.168.235.30.8080 > 192.0.2.1.45917: Flags [.], seq 1:1001, ack 101, win 64140, length 1000: HTTP
22:00:20.548221 tun0 Out IP 192.168.235.30.8080 > 192.0.2.1.45917: Flags [P.], seq 1001:2001, ack 101, win 64140, length 1000: HTTP
22:00:20.548226 tun0 Out IP 192.168.235.30.8080 > 192.0.2.1.45917: Flags [.], seq 2001:3001, ack 101, win 64140, length 1000: HTTP
22:00:20.548227 tun0 Out IP 192.168.235.30.8080 > 192.0.2.1.45917: Flags [P.], seq 3001:4001, ack 101, win 64140, length 1000: HTTP
#

问题分析
通常所说的发送窗口就是min(cwnd,rwnd),由于初始 CWND 为 10,所以初步的判断仍是 rwnd 的限制,但是为什么 Win 2000 没生效,确实让人费解。
第一步验证 rwnd ,考虑到 No.3 和 No4 均有 Win 值,如果 Win 2000 未生效的情况,是否是受限制于上一个 Win 10000。修改脚本如下,考虑测试简便,调整成了 3000 和 2000,仍然尝试写入 4000 字节大小的数据包。
plain
# cat tcp_troubleshooting_1_002.pkt
`ethtool -K tun0 tso off
ethtool -K tun0 gso off`
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 3000
+0 accept(3, ..., ...) = 4
+0.01 < P. 1:101(100) ack 1 win 2000
+0 > . 1:1(0) ack 101
+0.01 write(4,...,4000) = 4000
+0 `sleep 10`
#
可以看到服务器端发送的数据包只有 3 个,也就是受限于上一个 Win 3000,而不是最近的一个 Win 2000。
plain
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:30:05.357883 tun0 In IP 192.0.2.1.40269 > 192.168.166.85.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
22:30:05.357911 tun0 Out IP 192.168.166.85.8080 > 192.0.2.1.40269: Flags [S.], seq 2967841851, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
22:30:05.367982 tun0 In IP 192.0.2.1.40269 > 192.168.166.85.8080: Flags [.], ack 1, win 3000, length 0
22:30:05.378045 tun0 In IP 192.0.2.1.40269 > 192.168.166.85.8080: Flags [P.], seq 1:101, ack 1, win 2000, length 100: HTTP
22:30:05.378065 tun0 Out IP 192.168.166.85.8080 > 192.0.2.1.40269: Flags [.], ack 101, win 64140, length 0
22:30:05.388145 tun0 Out IP 192.168.166.85.8080 > 192.0.2.1.40269: Flags [.], seq 1:1001, ack 101, win 64140, length 1000: HTTP
22:30:05.388155 tun0 Out IP 192.168.166.85.8080 > 192.0.2.1.40269: Flags [P.], seq 1001:2001, ack 101, win 64140, length 1000: HTTP
22:30:05.388158 tun0 Out IP 192.168.166.85.8080 > 192.0.2.1.40269: Flags [.], seq 2001:3001, ack 101, win 64140, length 1000: HTTP
#
那么为什么服务器端在收到 No.4 也就是 Win 2000 的数据包后,没有更新 rwnd ?带着疑惑,翻了源码和很多资料,最后才找到答案,相关发送窗口更新的处理函数为 tcp_ack_update_window -> tcp_may_update_window ,如下。
plain
/* Check that window update is acceptable.
* The function assumes that snd_una<=ack<=snd_next.
*/
static inline bool tcp_may_update_window(const struct tcp_sock *tp,
const u32 ack, const u32 ack_seq,
const u32 nwin)
{
return after(ack, tp->snd_una) ||
after(ack_seq, tp->snd_wl1) ||
(ack_seq == tp->snd_wl1 && nwin > tp->snd_wnd);
}
主要有三个判断条件,after(ack, tp->snd_una) || after(ack_seq, tp->snd_wl1) || (ack_seq == tp->snd_wl1 && nwin > tp->snd_wnd) 。
- after(ack, tp->snd_una),由于 No.4 的 ACK 并没有确认新数据,所以判断不成立;
- after(ack_seq, tp->snd_wl1),由于 No.4 的 Seq Num 和之前 snd_wl1 一样,所以判断也不成立;
- (ack_seq == tp->snd_wl1 && nwin > tp->snd_wnd) ,由于 No.4 的 Seq Num 和之前 snd_wl1 一样,前一个成立,但是 nwin 为 2000 并不大于 snd_wnd 10000,后一个不成立,所以判断也不成立。
综合 return 为 0 ,也就是说并不会更新发送窗口为 2000,仍保持发送窗口为 10000,因此服务器最终是可以发出 4 个 1000 字节的数据包。
答案是找到了,但既然到这一步了,就继续通过实验验证,巩固下这个知识点。
实验扩展
通过三个实验验证以上三个判断条件,在各自条件成立的情况下,从而更新发送窗口。
- after(ack, tp->snd_una)
修改脚本,使得 ACK 确认新数据,如下,ACK win 2000 的 ack num 1001 确认了新数据。
plain
# cat tcp_troubleshooting_1_003.pkt
`ethtool -K tun0 tso off
ethtool -K tun0 gso off`
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
+0.01 < P. 1:101(100) ack 1 win 3000
+0 > . 1:1(0) ack 101
+0.01 write(4,...,1000) = 1000
+0.01 < . 101:101(0) ack 1001 win 2000
+0.01 write(4,...,4000) = 4000
#
Win 2000 限制生效,服务器端仅能发出 2 个 MSS 1000 大小的数据包。
plain
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
22:55:16.017895 ? In IP 192.0.2.1.36917 > 192.168.24.159.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
22:55:16.017930 ? Out IP 192.168.24.159.8080 > 192.0.2.1.36917: Flags [S.], seq 4123551004, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
22:55:16.028014 ? In IP 192.0.2.1.36917 > 192.168.24.159.8080: Flags [.], ack 1, win 10000, length 0
22:55:16.038118 ? In IP 192.0.2.1.36917 > 192.168.24.159.8080: Flags [P.], seq 1:101, ack 1, win 3000, length 100: HTTP
22:55:16.038154 ? Out IP 192.168.24.159.8080 > 192.0.2.1.36917: Flags [.], ack 101, win 64140, length 0
22:55:16.048290 ? Out IP 192.168.24.159.8080 > 192.0.2.1.36917: Flags [P.], seq 1:1001, ack 101, win 64140, length 1000: HTTP
22:55:16.058342 ? In IP 192.0.2.1.36917 > 192.168.24.159.8080: Flags [.], ack 1001, win 2000, length 0
22:55:16.068472 ? Out IP 192.168.24.159.8080 > 192.0.2.1.36917: Flags [.], seq 1001:2001, ack 101, win 64140, length 1000: HTTP
22:55:16.068479 ? Out IP 192.168.24.159.8080 > 192.0.2.1.36917: Flags [P.], seq 2001:3001, ack 101, win 64140, length 1000: HTTP
#
- after(ack_seq, tp->snd_wl1)
修改脚本,使得 ACK Seq num 101 更新,大于之前的 snd_wl1 1 。
plain
# cat tcp_troubleshooting_1_004.pkt
`ethtool -K tun0 tso off
ethtool -K tun0 gso off`
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 10000
+0 accept(3, ..., ...) = 4
+0.01 < P. 1:101(100) ack 1 win 3000
+0 > . 1:1(0) ack 101
+0.01 < P. 101:201(100) ack 1 win 2000
+0 > . 1:1(0) ack 201
+0.01 write(4,...,4000) = 4000
#
Win 2000 限制生效,服务器端仅能发出 2 个 MSS 1000 大小的数据包。
plain
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
23:00:54.817885 tun0 In IP 192.0.2.1.49051 > 192.168.214.190.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
23:00:54.817913 tun0 Out IP 192.168.214.190.8080 > 192.0.2.1.49051: Flags [S.], seq 2399892350, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
23:00:54.827992 ? In IP 192.0.2.1.49051 > 192.168.214.190.8080: Flags [.], ack 1, win 10000, length 0
23:00:54.838065 ? In IP 192.0.2.1.49051 > 192.168.214.190.8080: Flags [P.], seq 1:101, ack 1, win 3000, length 100: HTTP
23:00:54.838084 ? Out IP 192.168.214.190.8080 > 192.0.2.1.49051: Flags [.], ack 101, win 64140, length 0
23:00:54.848151 ? In IP 192.0.2.1.49051 > 192.168.214.190.8080: Flags [P.], seq 101:201, ack 1, win 2000, length 100: HTTP
23:00:54.848170 ? Out IP 192.168.214.190.8080 > 192.0.2.1.49051: Flags [.], ack 201, win 64040, length 0
23:00:54.858248 ? Out IP 192.168.214.190.8080 > 192.0.2.1.49051: Flags [.], seq 1:1001, ack 201, win 64040, length 1000: HTTP
23:00:54.858252 ? Out IP 192.168.214.190.8080 > 192.0.2.1.49051: Flags [P.], seq 1001:2001, ack 201, win 64040, length 1000: HTTP
#
- (ack_seq == tp->snd_wl1 && nwin > tp->snd_wnd)
修改脚本,使得 nwin 3000 大于 snd_wnd 2000。
plain
# cat tcp_troubleshooting_1_005.pkt
`ethtool -K tun0 tso off
ethtool -K tun0 gso off`
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>
+0 > S. 0:0(0) ack 1 <...>
+0.01 < . 1:1(0) ack 1 win 2000
+0 accept(3, ..., ...) = 4
+0.01 < P. 1:101(100) ack 1 win 3000
+0 > . 1:1(0) ack 101
+0.01 write(4,...,4000) = 4000
#
Win 3000 限制生效,服务器端能发出 3 个 MSS 1000 大小的数据包。
plain
# tcpdump -i any -nn port 8080
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
23:03:00.017903 ? In IP 192.0.2.1.34617 > 192.168.35.60.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 0
23:03:00.017940 ? Out IP 192.168.35.60.8080 > 192.0.2.1.34617: Flags [S.], seq 3161188315, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 0
23:03:00.028034 ? In IP 192.0.2.1.34617 > 192.168.35.60.8080: Flags [.], ack 1, win 2000, length 0
23:03:00.038103 ? In IP 192.0.2.1.34617 > 192.168.35.60.8080: Flags [P.], seq 1:101, ack 1, win 3000, length 100: HTTP
23:03:00.038126 ? Out IP 192.168.35.60.8080 > 192.0.2.1.34617: Flags [.], ack 101, win 64140, length 0
23:03:00.048222 ? Out IP 192.168.35.60.8080 > 192.0.2.1.34617: Flags [.], seq 1:1001, ack 101, win 64140, length 1000: HTTP
23:03:00.048232 ? Out IP 192.168.35.60.8080 > 192.0.2.1.34617: Flags [P.], seq 1001:2001, ack 101, win 64140, length 1000: HTTP
23:03:00.048235 ? Out IP 192.168.35.60.8080 > 192.0.2.1.34617: Flags [.], seq 2001:3001, ack 101, win 64140, length 1000: HTTP
#
问题总结
实际中的场景,你们是否有见过,感觉奇奇怪怪的知识又增加了。