文章目录
- [1. 前言](#1. 前言)
- [2. 现场](#2. 现场)
- [3. 问题分析](#3. 问题分析)
1. 前言
限于作者能力水平,本文可能存在谬误,因此而给读者带来的损失,作者不做任何承诺。
2. 现场
通过 trace-cmd 抓如下事件:
- ALSA事件:
hwptr, hw_mask_param, applptr - 调度事件:
sched_switch, sched_wakeup - 中断事件:
irq_handler_entry, irq_handler_exit
具体数据如下:
c
latency-7395 [002] 22719.778930: hwptr: pcmC0D0c/sub0: POS: pos=16, old=26768, base=26752, period=64, buf=128
latency-7395 [002] 22719.778932: hwptr: pcmC0D0c/sub0: POS: pos=16, old=26768, base=26752, period=64, buf=128
latency-7395 [002] 22719.778934: hwptr: pcmC0D0c/sub0: POS: pos=16, old=26768, base=26752, period=64, buf=128
latency-7395 [002] 22719.778937: hwptr: pcmC0D0c/sub0: POS: pos=48, old=26768, base=26752, period=64, buf=128
latency-7395 [002] 22719.778938: applptr: pcmC0D0c/sub0: prev=26768, curr=26800, avail=0, period=64, buf=128
latency-7395 [002] 22719.778941: hwptr: pcmC0D0p/sub0: POS: pos=96, old=26816, base=26752, period=64, buf=128
latency-7395 [002] 22719.778942: applptr: pcmC0D0p/sub0: prev=26896, curr=26928, avail=48, period=64, buf=128
latency-7395 [002] 22719.778949: sched_switch: latency:7395 [0] S ==> kworker/2:1:5386 [120]
<idle>-0 [007] 22719.778971: irq_handler_entry: irq=182 name=snd_hda_intel:card0
<idle>-0 [007] 22719.779013: hwptr: pcmC0D0c/sub0: IRQ: pos=48, old=26800, base=26752, period=64, buf=128
<idle>-0 [007] 22719.779070: irq_handler_exit: irq=182 ret=handled
<idle>-0 [007] 22719.779073: irq_handler_entry: irq=182 name=snd_hda_intel:card0
<idle>-0 [007] 22719.779084: irq_handler_exit: irq=182 ret=unhandled
kworker/2:1-5386 [002] 22719.779162: sched_wakeup: latency:7395 [0] CPU:002
<idle>-0 [007] 22719.780474: irq_handler_entry: irq=182 name=snd_hda_intel:card0
<idle>-0 [007] 22719.780524: sched_wakeup: kworker/7:1:5655 [120] CPU:007
<idle>-0 [007] 22719.780567: hwptr: pcmC0D0p/sub0: IRQ: pos=48, old=26848, base=26752, period=64, buf=128
<idle>-0 [007] 22719.780567: xrun: pcmC0D0p/sub0: XRUN: old=26928, base=26880, period=64, buf=128
<idle>-0 [007] 22719.780699: irq_handler_exit: irq=182 ret=handled
<idle>-0 [007] 22719.780701: irq_handler_entry: irq=182 name=snd_hda_intel:card0
<idle>-0 [007] 22719.780729: irq_handler_exit: irq=182 ret=unhandled
<idle>-0 [007] 22719.780736: sched_switch: swapper/7:0 [120] R ==> kworker/7:1:5655 [120]
kworker/7:1-5655 [007] 22719.780744: sched_switch: kworker/7:1:5655 [120] W ==> swapper/7:0 [120]
kworker/2:1-5386 [002] 22719.790242: sched_switch: kworker/2:1:5386 [120] R ==> latency:7395 [0]
latency-7395 [002] 22719.790455: hw_mask_param: pcmC0D0p:0 000/025 ACCESS 0000000000000000ffffffffffffffff 00000000000000000000000000000009
latency-7395 [002] 22719.790455: hw_mask_param: pcmC0D0p:0 000/025 FORMAT 0000000000000000ffffffffffffffff 00000000000000000000000000000404
latency-7395 [002] 22719.790456: hw_mask_param: pcmC0D0p:0 000/025 SUBFORMAT 0000000000000000ffffffffffffffff 00000000000000000000000000000001
3. 问题分析
观察到有 playback 的 XRUN 日志:
c
<idle>-0 [007] 22719.780567: xrun: pcmC0D0p/sub0: XRUN: old=26928, base=26880, period=64, buf=128
何解?音频播放,必须严格按照设定的采样频率、周期进行数据填充,否则就会因为数据不足给的不及时而出现 underrun 现象,因此对实时性有一定的要求。场景中使用 48000Hz 采样率,周期长度为 64 个采样,这样,每个周期的时长为 64/48000 = 1.333ms。从日志看,有好几个问题,但最为突出的是:进程 latency-7395 从时刻 22719.778949 被调度出去开始,到时刻 22719.790242 再次被调度进来,中间消耗了 22719.790242 - 22719.790242 = 11.293ms,也就意味着,有约 11.293ms 向 playback 缓冲填充数据了,这远远的超过了一个周期 64/48000 = 1.333ms;同时,结合 hwptr, applptr,在 进程 latency-7395 被调度出去时,剩余的数据不可能支撑 11.293ms 的消耗,因此导致了 playback underun。那么谁占住 CPU 不让进程 latency-7395 被调度?进程 latency-7395 从时刻 22719.778949 被 CPU-2 被调度出去,然后 CPU-2 转去 kworker/2:1:5386,然后就等待了 11.293ms 才被调度回来。进程 latency-7395 是一个实时进程,来自 alsa-lib/test/latency.c,其优先级为 0,这怎么还让优先级为 120 的 kworker/2:1:5386 给占用了这么长时间呢?答案是当前测试内核内,调度 feature RT_RUNTIME_SHARE 默认被关闭导致的,详见链接: https://lore.kernel.org/lkml/c596a06773658d976fb839e02843a459ed4c2edf.1479204252.git.bristot@redhat.com/
可以通过命令消除该补丁的影响:
bash
sysctl -w kernel.sched_rt_runtime_us=1000000
然后再次测试,不再有 XRUN 发生,调查至此结束。