iptables 规则重启机器后丢失导致k8s网络不可用

1 现象

在宿主机上 DNS 解析正常:

XML 复制代码
$ nslookup kubernetes.default.svc.cluster.local
Server:         223.5.5.5
Address:        223.5.5.5#53
** server can't find kubernetes.default.svc.cluster.local: NXDOMAIN

$ nslookup google.com
Server:         223.5.5.5
Address:        223.5.5.5#53
Non-authoritative answer:
Name:   google.com
Address: 142.250.73.110

但在 k8s pod 内 DNS 解析失败:

bash 复制代码
$ nslookup kubernetes.default.svc.cluster.local
;; connection timed out; no servers could be reached

$ nslookup google.com
;; connection timed out; no servers could be reached

CoreDNS pod 日志大量 timeout:

XML 复制代码
$ kubectl -n kube-system logs coredns-7cb5659999-nshr9 --tail=50
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.6
linux/arm64, go1.17.1, 13a9191
[ERROR] plugin/errors: 2 2445800166269057896.3880739144127939746. HINFO: read udp 10.244.0.72:58074->223.5.5.5:53: i/o timeout
[ERROR] plugin/errors: 2 2445800166269057896.3880739144127939746. HINFO: read udp 10.244.0.72:53367->172.16.240.2:53: i/o timeout
[ERROR] plugin/errors: 2 2445800166269057896.3880739144127939746. HINFO: read udp 10.244.0.72:56148->223.5.5.5:53: i/o timeout
[ERROR] plugin/errors: 2 2445800166269057896.3880739144127939746. HINFO: read udp 10.244.0.72:34444->223.5.5.5:53: i/o timeout
[ERROR] plugin/errors: 2 2445800166269057896.3880739144127939746. HINFO: read udp 10.244.0.72:56775->172.16.240.2:53: i/o timeout
[ERROR] plugin/errors: 2 2445800166269057896.3880739144127939746. HINFO: read udp 10.244.0.72:59791->223.5.5.5:53: i/o timeout
[ERROR] plugin/errors: 2 2445800166269057896.3880739144127939746. HINFO: read udp 10.244.0.72:48823->172.16.240.2:53: i/o timeout
[ERROR] plugin/errors: 2 2445800166269057896.3880739144127939746. HINFO: read udp 10.244.0.72:41791->223.6.6.6:53: i/o timeout
[ERROR] plugin/errors: 2 2445800166269057896.3880739144127939746. HINFO: read udp 10.244.0.72:33037->223.6.6.6:53: i/o timeout

2 原因

直接原因

机器重启后,iptables 规则丢失,其中 FORWARD 链的默认策略从 ACCEPT 被重置为 DROP,导致 Pod 网络流量被阻断。

根本原因

iptables 规则保存在内核内存,重启后会自动清空,而 kube-proxy 重启后只会恢复 Service 相关规则(NAT表),FORWARD 策略(filter 表)不会被 kube-proxy 恢复,因此被恢复为内核默认值 DROP,所有 Pod 网络流量就被丢弃。

3 解决

持久化网络规则:

bash 复制代码
sudo cat > /usr/local/bin/k8s-network-init.sh <<'EOF'
#!/bin/bash
set -e

echo "Initializing Kubernetes network at $(date)" >> /var/log/k8s-network.log

# 1. 启用 IP 转发
sysctl -w net.ipv4.ip_forward=1
sysctl -w net.ipv4.conf.all.forwarding=1

# 2. 设置全局策略
iptables -P FORWARD ACCEPT
iptables -P INPUT ACCEPT
iptables -P OUTPUT ACCEPT

# 3. 清空可能冲突的规则(可选)
# iptables -F FORWARD

# 4. 确保 Pod 网络可以通信
iptables -I FORWARD -s 10.244.0.0/16 -j ACCEPT 2>/dev/null || true
iptables -I FORWARD -d 10.244.0.0/16 -j ACCEPT 2>/dev/null || true

# 5. 确保 NAT 规则(允许 Pod 访问外网)
iptables -t nat -I POSTROUTING -s 10.244.0.0/16 ! -d 10.244.0.0/16 -j MASQUERADE 2>/dev/null || true

# 6. 保存规则
if command -v netfilter-persistent &> /dev/null; then
    netfilter-persistent save >> /var/log/k8s-network.log 2>&1
fi

echo "Network initialization completed at $(date)" >> /var/log/k8s-network.log
EOF

sudo chmod +x /usr/local/bin/k8s-network-init.sh

创建 systemd 服务:

bash 复制代码
sudo cat > /etc/systemd/system/k8s-network-init.service <<'EOF'
[Unit]
Description=Kubernetes Network Initialization
Before=kubelet.service
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/k8s-network-init.sh
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable k8s-network-init.service
sudo systemctl start k8s-network-init.service
相关推荐
MAVER1CK1 小时前
Docker容器创建好后修改容器配置
运维·docker·容器
金銀銅鐵2 小时前
[Java] 如何将 Lambda 表达式对应的类保存到 class 文件中?
java·后端
五月君_2 小时前
Bun v1.3.14 发布,Rust 版即将进 Claude Code 内测,下一版可能就告别 Zig
开发语言·后端·rust
明月_清风2 小时前
🍃 MongoDB 从入门到上手:一篇写给新手的科普指南
后端·mongodb
Yang96114 小时前
无损精准查缆:鼎讯 G-340A 在铁路高速场景的应用
网络·信息与通信
程序员cxuan4 小时前
当 00 后开始用 token 给学校送礼
人工智能·后端·程序员
开开心心就好4 小时前
免费流畅的远程控制实用工具
linux·运维·服务器·网络·智能手机·excel
夕颜1115 小时前
opencli 使用总结
后端
青云计划5 小时前
Feed流
java·后端·spring
☞遠航☜5 小时前
搭建基础的springcloud alibaba项目练习
后端·spring·spring cloud