freeswitch 1.10.10-dev录音早期媒体卡通道的bug分析

最近编译了fs 1.10.10-dev也就是 master版本(2023年7月6日) 给几个客户升级了一下,发现非常不稳定(每天都有几个通道卡在early状态),最近才有空来分析原因。

之前跑的是1.10.8 release 版本,从来没出现过这个问题,我把 1.10.8的代码和1.10.10-dev的代码整体对比了一下,整体改变不大,其中 switch_core_media_set_codec 函数的修改引擎我的警惕,因为这个修改时直接修改了互斥锁

switch_core_session_lock_codec_write(session);

switch_core_session_lock_codec_read(session);

的锁定范围。我查看了一下这个提交说明

2023年6月12日 这个提交给合并到到了master

SHA-1: 645b610e7968594910dd2368c511fbc93c1373b7

  • Merge pull request #2109 from signalwire/switch_core_media_set_codec

Core\] Fix possible deadlock in switch_core_media_set_codec() 晚上 我联系客户 看看有没有卡在early状态的通道,刚好有一个,把所有线程的调用堆栈 输出来, 还真是卡在 switch_core_media_set_codec 。 #0 0x00007f9dc597a54d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f9dc5975eb6 in _L_lock_941 () from /lib64/libpthread.so.0 #2 0x00007f9dc5975daf in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f9dc8687759 in fspr_thread_mutex_lock (mutex=) at locks/unix/thread_mutex.c:92 #4 0x00007f9dc850e895 in switch_mutex_lock (lock=) at src/switch_apr.c:301 #5 0x00007f9db9c9029c in sofia_receive_message (session=0x7f9cf5b4a7c8, msg=0x7f9d345ac4d0) at mod_sofia.c:1526 #6 0x00007f9dc853f80b in switch_core_session_perform_receive_message (session=session@entry=0x7f9cf5b4a7c8, message=message@entry=0x7f9d345ac4d0, file=file@entry=0x7f9dc86a961d "src/switch_core_io.c", func=func@entry=0x7f9dc86a9970 <__func__.19414> "switch_core_session_read_frame", line=line@entry=416) at src/switch_core_session.c:930 #7 0x00007f9dc854aa7e in switch_core_session_read_frame (session=session@entry=0x7f9cf5b4a7c8, frame=frame@entry=0x7f9d345ae568, flags=flags@entry=0, stream_id=stream_id@entry=0) at src/switch_core_io.c:416 #8 0x00007f9dc8608251 in switch_ivr_sleep (session=0x7f9cf5b4a7c8, ms=100, sync=, args=0x0) at src/switch_ivr.c:294 #9 0x00007f9daeb4dcc5 in ?? () from /ddt/fs/mod/mod_cti.so #10 0x00007f9dc8543b5b in switch_core_session_exec (session=0x7f9cf5b4a7c8, application_interface=application_interface@entry=0x19077a0, arg=arg@entry=0x7f9d345aeb40 "38981 2000") at src/switch_core_session.c:2964 #11 0x00007f9dc85441ef in switch_core_session_execute_application_get_flags (session=, app=0x7f9db140b89b "cti_wait_for_answer", arg=0x7f9d345aeb40 "38981 2000", flags=) at src/switch_core_session.c:2824 #12 0x00007f9daeb8eb7e in ?? () from /ddt/fs/mod/mod_cti.so #13 0x00007f9daeb92619 in ?? () from /ddt/fs/mod/mod_cti.so #14 0x00007f9dc853c9c5 in switch_core_session_thread_pool_worker (thread=0x7f9d45af47a0, obj=) at src/switch_core_session.c:1790 #15 0x00007f9dc868d760 in dummy_worker (opaque=0x7f9d45af47a0) at threadproc/unix/thread.c:151 #16 0x00007f9dc5973ea5 in start_thread () from /lib64/libpthread.so.0 #17 0x00007f9dc4fc7b0d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f9d2f76b700 (LWP 30595)): #0 0x00007f9dc597a54d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f9dc5975eb6 in _L_lock_941 () from /lib64/libpthread.so.0 #2 0x00007f9dc5975daf in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f9dc8687759 in fspr_thread_mutex_lock (mutex=) at locks/unix/thread_mutex.c:92 #4 0x00007f9dc850e895 in switch_mutex_lock (lock=) at src/switch_apr.c:301 #5 0x00007f9dc852c6ec in switch_core_session_lock_codec_read (session=) at src/switch_core_codec.c:74 #6 0x00007f9dc8561e7c in switch_core_media_set_codec (session=session@entry=0x7f9cf5b4a7c8, force=force@entry=0, codec_flags=0) at src/switch_core_media.c:3608 #7 0x00007f9dc856763e in switch_core_media_activate_rtp (session=0x7f9cf5b4a7c8) at src/switch_core_media.c:8565 #8 0x00007f9db9cf73de in sofia_media_activate_rtp (tech_pvt=tech_pvt@entry=0x7f9cf5b53ff8) at sofia_media.c:58 #9 0x00007f9db9cf745e in sofia_media_tech_media (tech_pvt=tech_pvt@entry=0x7f9cf5b53ff8, r_sdp=, type=type@entry=SDP_TYPE_REQUEST) at sofia_media.c:189 #10 0x00007f9db9ccab5a in sofia_handle_sip_i_state (de=0x7f9d983bcea0, tags=0x7f9d882ae950, sip=0x0, sofia_private=, nh=0x7f9da1db69b0, profile=0x1ac35070, nua=0x7f9d98040be0, phrase=0x7f9d882aece6 "Ringing", status=183, session=0x7f9cf5b4a7c8) at sofia.c:7683 #11 our_sofia_event_callback (event=nua_i_state, status=, phrase=0x7f9d882aece6 "Ringing", nua=0x7f9d98040be0, profile=0x1ac35070, nh=0x7f9da1db69b0, sofia_private=0x7f9d1927a640, sip=0x0, de=de@entry=0x7f9d983bcea0, tags=0x7f9d882ae950) at sofia.c:1813 #12 0x00007f9db9ccea5b in sofia_process_dispatch_event (dep=0x7f9d2f76a2c0) at sofia.c:2253 #13 0x00007f9db9c8ffc9 in sofia_receive_message (session=0x7f9cf5b4a7c8, msg=0x7f9d2f76aa20) at mod_sofia.c:1347 #14 0x00007f9dc853f6d5 in switch_core_session_perform_receive_message (session=session@entry=0x7f9cf5b4a7c8, message=message@entry=0x7f9d2f76aa20, file=file@entry=0x7f9dc86c5e35 "src/switch_ivr.c", func=func@entry=0x7f9dc86c6ed0 <__func__.19070> "switch_ivr_parse_signal_data", line=line@entry=893) at src/switch_core_session.c:853 #15 0x00007f9dc8604f9c in switch_ivr_parse_signal_data (session=session@entry=0x7f9cf5b4a7c8, all=all@entry=SWITCH_TRUE, only_session_thread=only_session_thread@entry=SWITCH_FALSE) at src/switch_ivr.c:893 #16 0x00007f9dc8604fec in switch_ivr_parse_all_signal_data (session=session@entry=0x7f9cf5b4a7c8) at src/switch_ivr.c:906 #17 0x00007f9dc8605007 in switch_ivr_parse_all_messages (session=session@entry=0x7f9cf5b4a7c8) at src/switch_ivr.c:852 #18 0x00007f9dc8607b4a in switch_ivr_parse_all_events (session=session@entry=0x7f9cf5b4a7c8) at src/switch_ivr.c:925 #19 0x00007f9dc8548632 in switch_core_session_run (session=0x7f9cf5b4a7c8) at src/switch_core_state_machine.c:710 #20 0x00007f9dc85412ae in switch_core_session_thread (thread=, obj=0x7f9cf5b4a7c8) at src/switch_core_session.c:1726 #21 0x00007f9dc853c9c5 in switch_core_session_thread_pool_worker (thread=0x7f9d38410920, obj=) at src/switch_core_session.c:1790 #22 0x00007f9dc868d760 in dummy_worker (opaque=0x7f9d38410920) at threadproc/unix/thread.c:151 #23 0x00007f9dc5973ea5 in start_thread () from /lib64/libpthread.so.0 #24 0x00007f9dc4fc7b0d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f9dc8db78c0 (LWP 11507)): 接下来就是分析代码找出死锁原因了 Thread 2 (Thread 0x7f9d2f76b700 (LWP 30595)): 和线程 Thread 1 (Thread 0x7f9dc8db78c0 (LWP 11507)): 调用堆栈都有 sofia_receive_message 这个函数 具体代码 switch_mutex_lock(tech_pvt->sofia_mutex); if (switch_core_session_in_thread(session)) { de->session = session; } sofia_process_dispatch_event(&de); switch_mutex_unlock(tech_pvt->sofia_mutex); Thread 2 (Thread 0x7f9d2f76b700 (LWP 30595)): 卡在 switch_mutex_lock(tech_pvt-\>sofia_mutex);了 Thread 1 (Thread 0x7f9d2f76b700 (LWP 30595)) 成功获取了 tech_pvt-\>sofia_mutex) 锁定进入了sofia_process_dispatch_event。最后卡在switch_core_media_set_codec函数里面的switch_core_session_lock_codec_write(session)步骤。 通过看 Thread 2 (Thread 0x7f9d2f76b700 (LWP 30595)): 的调用堆栈 函数 switch_core_session_read_frame 里面调用了switch_core_session_perform_receive_message, 对应代码 switch_mutex_lock(session->codec_read_mutex); ...省略很多代码 if (!switch_test_flag(session, SSF_WARN_TRANSCODE)) { switch_core_session_message_t msg = { 0 }; msg.message_id = SWITCH_MESSAGE_INDICATE_TRANSCODING_NECESSARY; switch_core_session_receive_message(session, &msg); switch_set_flag(session, SSF_WARN_TRANSCODE); } switch_mutex_lock(session-\>codec_read_mutex); 这个锁定和 switch_core_session_lock_codec_write(session) 对应的是一个锁定,至此 这个死锁的bug,我们就分析出来了。 Merge pull request #2109 from signalwire/switch_core_media_set_codec 这个提交为了解决一个bug,导致了一个更深的bug。 触发的原因是 cti模块有处理早期媒体,如果外呼用的是 ignore_early_media=true 应该是不会触发这个的bug的。 解决这个bug的方法是 调用 switch_core_session_receive_message 之前先 解锁 switch_mutex_unlock(session-\>codec_read_mutex)。代码如下。 msg.message_id = SWITCH_MESSAGE_INDICATE_TRANSCODING_NECESSARY; switch_mutex_unlock(session->codec_read_mutex); switch_core_session_receive_message(session, &msg); switch_mutex_lock(session->codec_read_mutex); switch_set_flag(session, SSF_WARN_TRANSCODE);

相关推荐
gxn_mmf1 天前
典籍知识问答重新生成和消息修改Bug修改
前端·bug
千里马学框架2 天前
重学安卓14/15自由窗口freeform企业实战bug-学员作业
android·framework·bug·systrace·安卓framework开发·安卓窗口系统·自由窗口
Bear on Toilet3 天前
Bug日记——实现“日期类”
开发语言·c++·bug
Direction_Wind3 天前
flinksql bug : Max aggregate function does not support type: CHAR
bug
mg6684 天前
Python Bug 修复案例分析:多线程数据竞争引发的bug 两种修复方法
bug
黎猫大侠4 天前
一次Android Fragment内存泄露的bug解决记录|Fragment not attach to an Activity
android·bug
七七小报5 天前
uniapp-商城-48-后台 分类数据添加修改弹窗bug
uni-app·bug
windwind20005 天前
发行基础:本地化BUG导致审核失败
游戏·青少年编程·编辑器·bug·创业创新·玩游戏
Htht1115 天前
【Qt】之【Bug】点击按钮(ui->pushButton)触发非本类设置的槽函数
qt·ui·bug
gxn_mmf6 天前
典籍知识问答模块AI问答功能feedbackBug修改+添加对话名称修改功能
前端·后端·bug