链接在这里:
原文如下:
Hello,
We see an intermittent issue in our on premise deployments with Kamailio behind HAProxy.
After several hours of idle time,
some mobile clients cannot receive inbound calls,
although they remain registered and can place outbound calls.
Topology:
Mobile Client → Internet → Customer Firewall → HAProxy → Kamailio
Transport is SIP over TLS.
HAProxy works in TCP mode and terminates nothing.
It forwards TLS to Kamailio.
Client behavior
• REGISTER with Expires 600 seconds
• Re-register every 7 minutes
• TLS connection from client to HAProxy stays up
• Outbound calls from client work
• During the failure window, client continues to send REGISTER and receives 200 OK
Failure scenario
• Several devices placed idle overnight
• In the morning, some cannot receive inbound calls
• Kamailio tries to send INVITE to the contact
• Kamailio opens a new TCP connection to HAProxy IP and an ephemeral port
• HAProxy responds with RST
Example log from Kamailio:
INFO: request_route: method [INVITE] from [sip:UE_2@domain.com] to [sip:UE_1@domain.com]
ERROR: tcpconn_1st_send(): connect 10.233.124.50:40398 failed (RST) Connection refused
ERROR: tcpconn_1st_send(): 10.233.124.50:40398: connect & send failed
WARNING: t_send_branch(): sending request on branch 0 failed
PCAP confirms:
• Kamailio sends SYN to 10.233.124.50:40398
• HAProxy replies RST, ACK
Important observations
-
usrloc contact host and port match the peer address seen in tls.list (src_ip + src_port).
-
At the time of failure, the client is still able to send REGISTER and get 200 OK.
-
If the same client initiates a call, the call is established successfully.
It looks like Kamailio sometimes fails to match the stored contact to an existing TLS connection and attempts to open a new TCP connection to the Contact host:port.
In our case, Contact host resolves to HAProxy IP and port.
Questions
-
Under which conditions does Kamailio decide to open a new TCP connection to the Contact instead of reusing an existing TLS connection?
-
If connection ID lookup fails, is fallback to active connect the expected behavior?
Environment
• Kamailio versions: 5.8.5, 6.0.5 (reproduced on both)
• HAProxy: TCP mode, no TLS termination
• Clients uses TLS only, no UDP
Kamailio relevant configuration:
tcp_connection_lifetime=605
modparam("registrar", "max_expires", 600)
modparam("registrar", "use_path", 1)
modparam("usrloc", "handle_lost_tcp", 1)
modparam("usrloc", "close_expired_tcp", 1)
HAProxy configuration:
frontend client-kamailio-sip
mode tcp
option tcpka
timeout client 600
default_backend server-kamailio-sip
backend server-kamailio-sip from haproxytech
mode tcp
option tcpka
timeout connect 30s
timeout server 600s
timeout tunnel 600s
Any guidance on correct architectural pattern or configuration for SIP TLS behind HAProxy would be appreciated.
Thank you.
Joey.
这人给出的细节还是很不错的
借助豆包翻译,译文为:
您好,
我们在本地部署环境中发现一个**间歇性问题**:Kamailio 部署在 HAProxy 后端。
经过数小时空闲后,
部分移动客户端**无法接收来电**,
尽管它们仍保持注册状态,且可以正常发起外呼。
网络拓扑
移动客户端 → 互联网 → 客户防火墙 → HAProxy → Kamailio
传输协议:基于 TLS 的 SIP
HAProxy 工作在 **TCP 模式**,不做任何 TLS 终止。
它将 TLS 流量直接透传给 Kamailio。
客户端行为
-
发送 REGISTER,过期时间 600 秒
-
每 7 分钟重新注册一次
-
客户端到 HAProxy 的 TLS 连接保持存活
-
客户端外呼正常
-
故障期间,客户端仍持续发送 REGISTER 并收到 200 OK 响应
故障场景
-
多台设备夜间空闲
-
次日早上,部分设备无法接收来电
-
Kamailio 尝试向联系人地址发送 INVITE
-
Kamailio 向 HAProxy IP 和一个临时端口发起**新的 TCP 连接**
-
HAProxy 回复 RST 重置
Kamailio 示例日志
```
INFO: request_route: method [INVITE] from [sip:UE_2@domain.com] to [sip:UE_1@domain.com]
ERROR: tcpconn_1st_send(): connect 10.233.124.50:40398 failed (RST) Connection refused
ERROR: tcpconn_1st_send(): 10.233.124.50:40398: connect & send failed
WARNING: t_send_branch(): sending request on branch 0 failed
```
抓包(PCAP)确认
-
Kamailio 向 10.233.124.50:40398 发送 SYN
-
HAProxy 回复 RST, ACK
重要观察
-
usrloc 中存储的联系人主机和端口,与 tls.list 中看到的对端地址(src_ip + src_port)一致。
-
故障发生时,客户端仍能发送 REGISTER 并收到 200 OK。
-
同一客户端主动发起呼叫时,呼叫可正常建立。
现象表明:
Kamailio 有时**无法将存储的联系人与已存在的 TLS 连接匹配**,
转而尝试向 Contact 主机:端口**新建 TCP 连接**。
在我们的环境中,Contact 主机解析为 HAProxy 的 IP 和端口。
问题
-
在什么条件下,Kamailio 会选择**新建 TCP 连接**到 Contact,而不是复用已有的 TLS 连接?
-
如果连接 ID 查找失败,**回退到主动新建连接**是否是预期行为?
环境信息
-
Kamailio 版本:5.8.5、6.0.5(两个版本均复现)
-
HAProxy:TCP 模式,不做 TLS 终止
-
客户端仅使用 TLS,无 UDP
Kamailio 相关配置
```
tcp_connection_lifetime=605
modparam("registrar", "max_expires", 600)
modparam("registrar", "use_path", 1)
modparam("usrloc", "handle_lost_tcp", 1)
modparam("usrloc", "close_expired_tcp", 1)
```
HAProxy 配置
```
frontend client-kamailio-sip
mode tcp
option tcpka
timeout client 600
default_backend server-kamailio-sip
backend server-kamailio-sip from haproxytech
mode tcp
option tcpka
timeout connect 30s
timeout server 600s
timeout tunnel 600s
```
如能提供关于 **SIP TLS 部署在 HAProxy 后端** 的正确架构方案或配置建议,我们将不胜感激。
谢谢。
Joey
目前暂时还没结论
但这人明显不熟悉 kamailio
我的看法是, HAProxy 跟 Kamailio 之间的 tcp 连接已经断开,至于是哪边主动断开,需要抓包确认,或者优化下 HAProxy 的配置,侧重点是不要断开他跟 Kamailio 之间的 tcp 连接
另外,Kamailio 可以配置 usrloc 的 ka 参数,也就是 服务器主动发sip 心跳保活,或者用 nathelper 的心跳保活
正常情况下, Kamailio 用已有的tcp连接找到 HAProxy, 发送 INVITE;如果tcp连接已经断开,才会主动tcp连 HAProxy, 这是反向连接,在NAT 环境下一般会导致失败
Core 文档里面有下面一段话
set_forward_no_connect
The message will be forwarded only if there is already an existing connection to the destination. It applies only to connection oriented protocols like TCP and TLS (TODO: SCTP), for UDP it will be ignored. The behavior depends in which route block the function is called:
normal request route: affects stateless forwards and tm. For tm it affects all the branches and the possible retransmissions (in fact there are no retransmission for TCP/TLS).
onreply_route[0] (stateless): equivalent to set_reply_*() (it's better to use set_reply_* though)
onreply_route[!=0] (tm): ignored
branch_route: affects the current branch only (all messages sent on this branch, like possible retransmissions and CANCELs).
onsend_route: like branch route
Example of usage:
route {
...
if (lookup()) {
//requests to local users. They are usually behind NAT so it does not make sense to try
//to establish a new TCP connection
set_forward_no_connect();
t_relay();
}
...
}
其中的一段话翻译为中文:
route {
...
if (lookup()) {
// 发往本地用户的请求。这些用户通常位于 NAT 后,因此无需尝试建立新的 TCP 连接
set_forward_no_connect();
t_relay();
}
...
}
也许我是错的,也许是 终端跟 HAProxy之间的连接问题
抓包
netstat -ant
kamcmd ul.dump
这几样东西一起看,必定能找到问题