SWT 又重启了?Android 系统 Watchdog 机制全拆解:为什么系统会莫名其妙自动重启,怎么抓到真凶?(附实战日志分析)
目录
- [一、SWT 是什么](#一、SWT 是什么)
- [二、Watchdog 的监控机制](#二、Watchdog 的监控机制)
- [三、Watchdog 是怎样判断"卡死了"的](#三、Watchdog 是怎样判断"卡死了"的)
- [四、SWT 重启的完整流程](#四、SWT 重启的完整流程)
- [五、常见的 SWT 触发场景](#五、常见的 SWT 触发场景)
- [六、实战:SWT 日志分析](#六、实战:SWT 日志分析)
- [七、实战:复现和定位 SWT 问题](#七、实战:复现和定位 SWT 问题)
- [八、实战:SWT 问题修复](#八、实战:SWT 问题修复)
- [九、Watchdog 的源码走读](#九、Watchdog 的源码走读)
- 十、常见踩坑记录
- 十一、总结
一、SWT 是什么
SWT 全称是 Software Watchdog Timer ,软件看门狗。它的本质是 SystemServer 进程里的 Watchdog 线程------专门盯着系统核心服务的线程有没有卡死。
如果你手机突然自动重启,开机后去抓 sysdump 日志,看到 "Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS ***"------这就是 SWT 干的。
#mermaid-svg-AtN9R7Dh1QxblWwQ{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-AtN9R7Dh1QxblWwQ .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-AtN9R7Dh1QxblWwQ .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-AtN9R7Dh1QxblWwQ .error-icon{fill:#552222;}#mermaid-svg-AtN9R7Dh1QxblWwQ .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-AtN9R7Dh1QxblWwQ .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-AtN9R7Dh1QxblWwQ .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-AtN9R7Dh1QxblWwQ .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-AtN9R7Dh1QxblWwQ .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-AtN9R7Dh1QxblWwQ .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-AtN9R7Dh1QxblWwQ .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-AtN9R7Dh1QxblWwQ .marker{fill:#333333;stroke:#333333;}#mermaid-svg-AtN9R7Dh1QxblWwQ .marker.cross{stroke:#333333;}#mermaid-svg-AtN9R7Dh1QxblWwQ svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-AtN9R7Dh1QxblWwQ p{margin:0;}#mermaid-svg-AtN9R7Dh1QxblWwQ .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-AtN9R7Dh1QxblWwQ .cluster-label text{fill:#333;}#mermaid-svg-AtN9R7Dh1QxblWwQ .cluster-label span{color:#333;}#mermaid-svg-AtN9R7Dh1QxblWwQ .cluster-label span p{background-color:transparent;}#mermaid-svg-AtN9R7Dh1QxblWwQ .label text,#mermaid-svg-AtN9R7Dh1QxblWwQ span{fill:#333;color:#333;}#mermaid-svg-AtN9R7Dh1QxblWwQ .node rect,#mermaid-svg-AtN9R7Dh1QxblWwQ .node circle,#mermaid-svg-AtN9R7Dh1QxblWwQ .node ellipse,#mermaid-svg-AtN9R7Dh1QxblWwQ .node polygon,#mermaid-svg-AtN9R7Dh1QxblWwQ .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-AtN9R7Dh1QxblWwQ .rough-node .label text,#mermaid-svg-AtN9R7Dh1QxblWwQ .node .label text,#mermaid-svg-AtN9R7Dh1QxblWwQ .image-shape .label,#mermaid-svg-AtN9R7Dh1QxblWwQ .icon-shape .label{text-anchor:middle;}#mermaid-svg-AtN9R7Dh1QxblWwQ .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-AtN9R7Dh1QxblWwQ .rough-node .label,#mermaid-svg-AtN9R7Dh1QxblWwQ .node .label,#mermaid-svg-AtN9R7Dh1QxblWwQ .image-shape .label,#mermaid-svg-AtN9R7Dh1QxblWwQ .icon-shape .label{text-align:center;}#mermaid-svg-AtN9R7Dh1QxblWwQ .node.clickable{cursor:pointer;}#mermaid-svg-AtN9R7Dh1QxblWwQ .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-AtN9R7Dh1QxblWwQ .arrowheadPath{fill:#333333;}#mermaid-svg-AtN9R7Dh1QxblWwQ .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-AtN9R7Dh1QxblWwQ .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-AtN9R7Dh1QxblWwQ .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-AtN9R7Dh1QxblWwQ .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-AtN9R7Dh1QxblWwQ .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-AtN9R7Dh1QxblWwQ .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-AtN9R7Dh1QxblWwQ .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-AtN9R7Dh1QxblWwQ .cluster text{fill:#333;}#mermaid-svg-AtN9R7Dh1QxblWwQ .cluster span{color:#333;}#mermaid-svg-AtN9R7Dh1QxblWwQ div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-AtN9R7Dh1QxblWwQ .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-AtN9R7Dh1QxblWwQ rect.text{fill:none;stroke-width:0;}#mermaid-svg-AtN9R7Dh1QxblWwQ .icon-shape,#mermaid-svg-AtN9R7Dh1QxblWwQ .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-AtN9R7Dh1QxblWwQ .icon-shape p,#mermaid-svg-AtN9R7Dh1QxblWwQ .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-AtN9R7Dh1QxblWwQ .icon-shape .label rect,#mermaid-svg-AtN9R7Dh1QxblWwQ .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-AtN9R7Dh1QxblWwQ .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-AtN9R7Dh1QxblWwQ .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-AtN9R7Dh1QxblWwQ :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 正常响应
超时未响应
Watchdog 线程
每隔 30 秒检查
核心服务线程
是否在 60 秒内
响应了心跳?
无事发生
继续监控
★ 触发 SWT
杀 SystemServer
手机重启
二、Watchdog 的监控机制
Watchdog 在 SystemServer 启动阶段被创建并开始监控。它监控的不是所有线程,而是一组注册到 Watchdog 的核心 Handler:
#mermaid-svg-nivOKNMxLrfJWUUV{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-nivOKNMxLrfJWUUV .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-nivOKNMxLrfJWUUV .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-nivOKNMxLrfJWUUV .error-icon{fill:#552222;}#mermaid-svg-nivOKNMxLrfJWUUV .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-nivOKNMxLrfJWUUV .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-nivOKNMxLrfJWUUV .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-nivOKNMxLrfJWUUV .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-nivOKNMxLrfJWUUV .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-nivOKNMxLrfJWUUV .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-nivOKNMxLrfJWUUV .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-nivOKNMxLrfJWUUV .marker{fill:#333333;stroke:#333333;}#mermaid-svg-nivOKNMxLrfJWUUV .marker.cross{stroke:#333333;}#mermaid-svg-nivOKNMxLrfJWUUV svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-nivOKNMxLrfJWUUV p{margin:0;}#mermaid-svg-nivOKNMxLrfJWUUV .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-nivOKNMxLrfJWUUV .cluster-label text{fill:#333;}#mermaid-svg-nivOKNMxLrfJWUUV .cluster-label span{color:#333;}#mermaid-svg-nivOKNMxLrfJWUUV .cluster-label span p{background-color:transparent;}#mermaid-svg-nivOKNMxLrfJWUUV .label text,#mermaid-svg-nivOKNMxLrfJWUUV span{fill:#333;color:#333;}#mermaid-svg-nivOKNMxLrfJWUUV .node rect,#mermaid-svg-nivOKNMxLrfJWUUV .node circle,#mermaid-svg-nivOKNMxLrfJWUUV .node ellipse,#mermaid-svg-nivOKNMxLrfJWUUV .node polygon,#mermaid-svg-nivOKNMxLrfJWUUV .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-nivOKNMxLrfJWUUV .rough-node .label text,#mermaid-svg-nivOKNMxLrfJWUUV .node .label text,#mermaid-svg-nivOKNMxLrfJWUUV .image-shape .label,#mermaid-svg-nivOKNMxLrfJWUUV .icon-shape .label{text-anchor:middle;}#mermaid-svg-nivOKNMxLrfJWUUV .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-nivOKNMxLrfJWUUV .rough-node .label,#mermaid-svg-nivOKNMxLrfJWUUV .node .label,#mermaid-svg-nivOKNMxLrfJWUUV .image-shape .label,#mermaid-svg-nivOKNMxLrfJWUUV .icon-shape .label{text-align:center;}#mermaid-svg-nivOKNMxLrfJWUUV .node.clickable{cursor:pointer;}#mermaid-svg-nivOKNMxLrfJWUUV .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-nivOKNMxLrfJWUUV .arrowheadPath{fill:#333333;}#mermaid-svg-nivOKNMxLrfJWUUV .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-nivOKNMxLrfJWUUV .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-nivOKNMxLrfJWUUV .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-nivOKNMxLrfJWUUV .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-nivOKNMxLrfJWUUV .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-nivOKNMxLrfJWUUV .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-nivOKNMxLrfJWUUV .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-nivOKNMxLrfJWUUV .cluster text{fill:#333;}#mermaid-svg-nivOKNMxLrfJWUUV .cluster span{color:#333;}#mermaid-svg-nivOKNMxLrfJWUUV div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-nivOKNMxLrfJWUUV .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-nivOKNMxLrfJWUUV rect.text{fill:none;stroke-width:0;}#mermaid-svg-nivOKNMxLrfJWUUV .icon-shape,#mermaid-svg-nivOKNMxLrfJWUUV .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-nivOKNMxLrfJWUUV .icon-shape p,#mermaid-svg-nivOKNMxLrfJWUUV .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-nivOKNMxLrfJWUUV .icon-shape .label rect,#mermaid-svg-nivOKNMxLrfJWUUV .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-nivOKNMxLrfJWUUV .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-nivOKNMxLrfJWUUV .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-nivOKNMxLrfJWUUV :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} Watchdog 单例
MainHandler
主线程(AMS / WMS / ...)
FgThread
前台线程
IoThread
IO 线程
DisplayThread
显示线程
AnimationThread
动画线程
BinderThread
Binder 调用线程
... 其他被监控线程
每个被监控的线程都有一个对应的 MonitorChecker:
| 被监控的东西 | 类型 | 超时了说明什么 |
|---|---|---|
| MainHandler | Handler | 主线程死锁或卡死------最常见 |
| AMS | Monitor | ActivityManagerService 无响应 |
| WMS | Monitor | WindowManagerService 无响应 |
| InputManagerService | Monitor | 输入系统无响应 |
| NetworkManagementService | Monitor | 网络管理服务无响应 |
| Binder 线程池 | Monitor | SystemServer 的 Binder 全部卡住 |
任何线程在处理消息时持锁超过 60 秒,Watchdog 就直接判定卡死,触发 SWT 重启。
三、Watchdog 是怎样判断"卡死了"的
Watchdog 的检测逻辑不复杂------它定时给每个被监控的 Handler 发一条空消息,看能不能在超时时间内处理完:
#mermaid-svg-Zx18xGsmjF541HXt{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-Zx18xGsmjF541HXt .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-Zx18xGsmjF541HXt .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-Zx18xGsmjF541HXt .error-icon{fill:#552222;}#mermaid-svg-Zx18xGsmjF541HXt .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-Zx18xGsmjF541HXt .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-Zx18xGsmjF541HXt .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-Zx18xGsmjF541HXt .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-Zx18xGsmjF541HXt .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-Zx18xGsmjF541HXt .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-Zx18xGsmjF541HXt .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-Zx18xGsmjF541HXt .marker{fill:#333333;stroke:#333333;}#mermaid-svg-Zx18xGsmjF541HXt .marker.cross{stroke:#333333;}#mermaid-svg-Zx18xGsmjF541HXt svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-Zx18xGsmjF541HXt p{margin:0;}#mermaid-svg-Zx18xGsmjF541HXt .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-Zx18xGsmjF541HXt .cluster-label text{fill:#333;}#mermaid-svg-Zx18xGsmjF541HXt .cluster-label span{color:#333;}#mermaid-svg-Zx18xGsmjF541HXt .cluster-label span p{background-color:transparent;}#mermaid-svg-Zx18xGsmjF541HXt .label text,#mermaid-svg-Zx18xGsmjF541HXt span{fill:#333;color:#333;}#mermaid-svg-Zx18xGsmjF541HXt .node rect,#mermaid-svg-Zx18xGsmjF541HXt .node circle,#mermaid-svg-Zx18xGsmjF541HXt .node ellipse,#mermaid-svg-Zx18xGsmjF541HXt .node polygon,#mermaid-svg-Zx18xGsmjF541HXt .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-Zx18xGsmjF541HXt .rough-node .label text,#mermaid-svg-Zx18xGsmjF541HXt .node .label text,#mermaid-svg-Zx18xGsmjF541HXt .image-shape .label,#mermaid-svg-Zx18xGsmjF541HXt .icon-shape .label{text-anchor:middle;}#mermaid-svg-Zx18xGsmjF541HXt .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-Zx18xGsmjF541HXt .rough-node .label,#mermaid-svg-Zx18xGsmjF541HXt .node .label,#mermaid-svg-Zx18xGsmjF541HXt .image-shape .label,#mermaid-svg-Zx18xGsmjF541HXt .icon-shape .label{text-align:center;}#mermaid-svg-Zx18xGsmjF541HXt .node.clickable{cursor:pointer;}#mermaid-svg-Zx18xGsmjF541HXt .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-Zx18xGsmjF541HXt .arrowheadPath{fill:#333333;}#mermaid-svg-Zx18xGsmjF541HXt .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-Zx18xGsmjF541HXt .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-Zx18xGsmjF541HXt .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Zx18xGsmjF541HXt .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-Zx18xGsmjF541HXt .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Zx18xGsmjF541HXt .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-Zx18xGsmjF541HXt .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-Zx18xGsmjF541HXt .cluster text{fill:#333;}#mermaid-svg-Zx18xGsmjF541HXt .cluster span{color:#333;}#mermaid-svg-Zx18xGsmjF541HXt div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-Zx18xGsmjF541HXt .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-Zx18xGsmjF541HXt rect.text{fill:none;stroke-width:0;}#mermaid-svg-Zx18xGsmjF541HXt .icon-shape,#mermaid-svg-Zx18xGsmjF541HXt .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-Zx18xGsmjF541HXt .icon-shape p,#mermaid-svg-Zx18xGsmjF541HXt .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-Zx18xGsmjF541HXt .icon-shape .label rect,#mermaid-svg-Zx18xGsmjF541HXt .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-Zx18xGsmjF541HXt .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-Zx18xGsmjF541HXt .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-Zx18xGsmjF541HXt :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 是
否
Watchdog.run()
死循环
等 30 秒
(CHECK_INTERVAL)
记录当前时间
给每个被监控的 Handler
发一条空消息
等 60 秒
(DEFAULT_TIMEOUT)
所有 Handler
都处理完了?
★ 记录卡死线程
的堆栈
dump 所有线程堆栈
到 /data/anr/
发 SIGQUIT 信号
让 SystemServer 自杀
init 检测到
SystemServer 挂了
按 init.rc 配置
重启系统
关键参数:
| 参数 | 默认值 | 含义 |
|---|---|---|
CHECK_INTERVAL |
30s | 每 30 秒检查一轮 |
DEFAULT_TIMEOUT |
60s | 一条消息 60 秒没处理完就算超时 |
| 从卡死到重启 | 最长 90s | 30s 检查间隔 + 60s 超时 |
注意:SWT 不是实时检测的。从线程真正卡死到 Watchdog 触发重启,最长要 90 秒。
四、SWT 重启的完整流程
#mermaid-svg-MWKgPuPPUrkjLKEE{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-MWKgPuPPUrkjLKEE .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-MWKgPuPPUrkjLKEE .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-MWKgPuPPUrkjLKEE .error-icon{fill:#552222;}#mermaid-svg-MWKgPuPPUrkjLKEE .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-MWKgPuPPUrkjLKEE .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-MWKgPuPPUrkjLKEE .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-MWKgPuPPUrkjLKEE .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-MWKgPuPPUrkjLKEE .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-MWKgPuPPUrkjLKEE .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-MWKgPuPPUrkjLKEE .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-MWKgPuPPUrkjLKEE .marker{fill:#333333;stroke:#333333;}#mermaid-svg-MWKgPuPPUrkjLKEE .marker.cross{stroke:#333333;}#mermaid-svg-MWKgPuPPUrkjLKEE svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-MWKgPuPPUrkjLKEE p{margin:0;}#mermaid-svg-MWKgPuPPUrkjLKEE .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-MWKgPuPPUrkjLKEE .cluster-label text{fill:#333;}#mermaid-svg-MWKgPuPPUrkjLKEE .cluster-label span{color:#333;}#mermaid-svg-MWKgPuPPUrkjLKEE .cluster-label span p{background-color:transparent;}#mermaid-svg-MWKgPuPPUrkjLKEE .label text,#mermaid-svg-MWKgPuPPUrkjLKEE span{fill:#333;color:#333;}#mermaid-svg-MWKgPuPPUrkjLKEE .node rect,#mermaid-svg-MWKgPuPPUrkjLKEE .node circle,#mermaid-svg-MWKgPuPPUrkjLKEE .node ellipse,#mermaid-svg-MWKgPuPPUrkjLKEE .node polygon,#mermaid-svg-MWKgPuPPUrkjLKEE .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-MWKgPuPPUrkjLKEE .rough-node .label text,#mermaid-svg-MWKgPuPPUrkjLKEE .node .label text,#mermaid-svg-MWKgPuPPUrkjLKEE .image-shape .label,#mermaid-svg-MWKgPuPPUrkjLKEE .icon-shape .label{text-anchor:middle;}#mermaid-svg-MWKgPuPPUrkjLKEE .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-MWKgPuPPUrkjLKEE .rough-node .label,#mermaid-svg-MWKgPuPPUrkjLKEE .node .label,#mermaid-svg-MWKgPuPPUrkjLKEE .image-shape .label,#mermaid-svg-MWKgPuPPUrkjLKEE .icon-shape .label{text-align:center;}#mermaid-svg-MWKgPuPPUrkjLKEE .node.clickable{cursor:pointer;}#mermaid-svg-MWKgPuPPUrkjLKEE .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-MWKgPuPPUrkjLKEE .arrowheadPath{fill:#333333;}#mermaid-svg-MWKgPuPPUrkjLKEE .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-MWKgPuPPUrkjLKEE .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-MWKgPuPPUrkjLKEE .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-MWKgPuPPUrkjLKEE .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-MWKgPuPPUrkjLKEE .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-MWKgPuPPUrkjLKEE .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-MWKgPuPPUrkjLKEE .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-MWKgPuPPUrkjLKEE .cluster text{fill:#333;}#mermaid-svg-MWKgPuPPUrkjLKEE .cluster span{color:#333;}#mermaid-svg-MWKgPuPPUrkjLKEE div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-MWKgPuPPUrkjLKEE .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-MWKgPuPPUrkjLKEE rect.text{fill:none;stroke-width:0;}#mermaid-svg-MWKgPuPPUrkjLKEE .icon-shape,#mermaid-svg-MWKgPuPPUrkjLKEE .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-MWKgPuPPUrkjLKEE .icon-shape p,#mermaid-svg-MWKgPuPPUrkjLKEE .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-MWKgPuPPUrkjLKEE .icon-shape .label rect,#mermaid-svg-MWKgPuPPUrkjLKEE .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-MWKgPuPPUrkjLKEE .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-MWKgPuPPUrkjLKEE .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-MWKgPuPPUrkjLKEE :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 默认
设置了 critical
4 分钟 4 次
某个线程
持锁超过 60s
Watchdog 检测到
HandlerChecker 超时
收集堆栈
ActivityManager.dumpStackTraces()
写入 /data/anr/anr_xxx
Process.killProcess
杀 SystemServer PID
SystemServer 崩溃
init 进程
SIGCHLD 处理
init.rc 里怎么配的?
重启 Zygote / SystemServer
相当于软重启
进 recovery 模式
具体崩溃时的日志长这样:
01-01 12:00:00.000 1000 1234 5678 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.am.ActivityManagerService
01-01 12:00:00.000 1000 1234 5678 W Watchdog: foreground thread stack trace:
01-01 12:00:00.000 1000 1234 5678 W Watchdog: at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:xxxxx)
01-01 12:00:00.000 1000 1234 5678 W Watchdog: - waiting to lock <0x0a1b2c3d> (a com.android.server.am.ActivityManagerService)
01-01 12:00:00.000 1000 1234 5678 W Watchdog: held by thread 42
01-01 12:00:00.000 1000 1234 5678 W Watchdog: main thread stack trace:
01-01 12:00:00.000 1000 1234 5678 W Watchdog: at com.android.server.wm.WindowManagerService.relayoutWindow(...)
01-01 12:00:00.000 1000 1234 5678 W Watchdog: - waiting to lock <0x0a1b2c3d> (a com.android.server.am.ActivityManagerService)
01-01 12:00:00.000 1000 1234 5678 W Watchdog: held by thread 42
01-01 12:00:00.000 1000 1234 5678 I Process : Sending signal. PID: 1234 SIG: 3
01-01 12:00:01.000 1000 1234 5678 I Process : Sending signal. PID: 1234 SIG: 9
关键词:
WATCHDOG KILLING SYSTEM PROCESS+Blocked in monitor+waiting to lock。看到这三行,基本就是 SWT 导致的死锁重启。
五、常见的 SWT 触发场景
场景一:主线程死锁(最常见)
#mermaid-svg-fSzaYcL19n8qlOFA{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-fSzaYcL19n8qlOFA .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-fSzaYcL19n8qlOFA .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-fSzaYcL19n8qlOFA .error-icon{fill:#552222;}#mermaid-svg-fSzaYcL19n8qlOFA .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-fSzaYcL19n8qlOFA .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-fSzaYcL19n8qlOFA .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-fSzaYcL19n8qlOFA .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-fSzaYcL19n8qlOFA .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-fSzaYcL19n8qlOFA .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-fSzaYcL19n8qlOFA .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-fSzaYcL19n8qlOFA .marker{fill:#333333;stroke:#333333;}#mermaid-svg-fSzaYcL19n8qlOFA .marker.cross{stroke:#333333;}#mermaid-svg-fSzaYcL19n8qlOFA svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-fSzaYcL19n8qlOFA p{margin:0;}#mermaid-svg-fSzaYcL19n8qlOFA .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-fSzaYcL19n8qlOFA .cluster-label text{fill:#333;}#mermaid-svg-fSzaYcL19n8qlOFA .cluster-label span{color:#333;}#mermaid-svg-fSzaYcL19n8qlOFA .cluster-label span p{background-color:transparent;}#mermaid-svg-fSzaYcL19n8qlOFA .label text,#mermaid-svg-fSzaYcL19n8qlOFA span{fill:#333;color:#333;}#mermaid-svg-fSzaYcL19n8qlOFA .node rect,#mermaid-svg-fSzaYcL19n8qlOFA .node circle,#mermaid-svg-fSzaYcL19n8qlOFA .node ellipse,#mermaid-svg-fSzaYcL19n8qlOFA .node polygon,#mermaid-svg-fSzaYcL19n8qlOFA .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-fSzaYcL19n8qlOFA .rough-node .label text,#mermaid-svg-fSzaYcL19n8qlOFA .node .label text,#mermaid-svg-fSzaYcL19n8qlOFA .image-shape .label,#mermaid-svg-fSzaYcL19n8qlOFA .icon-shape .label{text-anchor:middle;}#mermaid-svg-fSzaYcL19n8qlOFA .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-fSzaYcL19n8qlOFA .rough-node .label,#mermaid-svg-fSzaYcL19n8qlOFA .node .label,#mermaid-svg-fSzaYcL19n8qlOFA .image-shape .label,#mermaid-svg-fSzaYcL19n8qlOFA .icon-shape .label{text-align:center;}#mermaid-svg-fSzaYcL19n8qlOFA .node.clickable{cursor:pointer;}#mermaid-svg-fSzaYcL19n8qlOFA .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-fSzaYcL19n8qlOFA .arrowheadPath{fill:#333333;}#mermaid-svg-fSzaYcL19n8qlOFA .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-fSzaYcL19n8qlOFA .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-fSzaYcL19n8qlOFA .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-fSzaYcL19n8qlOFA .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-fSzaYcL19n8qlOFA .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-fSzaYcL19n8qlOFA .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-fSzaYcL19n8qlOFA .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-fSzaYcL19n8qlOFA .cluster text{fill:#333;}#mermaid-svg-fSzaYcL19n8qlOFA .cluster span{color:#333;}#mermaid-svg-fSzaYcL19n8qlOFA div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-fSzaYcL19n8qlOFA .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-fSzaYcL19n8qlOFA rect.text{fill:none;stroke-width:0;}#mermaid-svg-fSzaYcL19n8qlOFA .icon-shape,#mermaid-svg-fSzaYcL19n8qlOFA .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-fSzaYcL19n8qlOFA .icon-shape p,#mermaid-svg-fSzaYcL19n8qlOFA .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-fSzaYcL19n8qlOFA .icon-shape .label rect,#mermaid-svg-fSzaYcL19n8qlOFA .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-fSzaYcL19n8qlOFA .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-fSzaYcL19n8qlOFA .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-fSzaYcL19n8qlOFA :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 线程 A
持锁 Lock1
等锁 Lock2
★ 死锁
线程 B
持锁 Lock2
等锁 Lock1
两个线程
都在等对方释放
永远等不到
Watchdog Handler
发的消息
也在等这两个锁
60s 超时
SWT 重启
代码示例:
java
// 一个典型的死锁
Object lockA = new Object();
Object lockB = new Object();
// 线程 A
new Thread(() -> {
synchronized (lockA) {
Thread.sleep(100);
synchronized (lockB) { // 等 lockB ------ 被线程 B 持着
doSomething();
}
}
}).start();
// 线程 B
new Thread(() -> {
synchronized (lockB) {
Thread.sleep(100);
synchronized (lockA) { // 等 lockA ------ 被线程 A 持着
doSomethingElse();
}
}
}).start();
// 两个线程互相等,永不解锁 → Watchdog 60s 后触发 SWT
场景二:Binder 调用阻塞
#mermaid-svg-jTyHbwTV9xPa1aw6{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-jTyHbwTV9xPa1aw6 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-jTyHbwTV9xPa1aw6 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-jTyHbwTV9xPa1aw6 .error-icon{fill:#552222;}#mermaid-svg-jTyHbwTV9xPa1aw6 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-jTyHbwTV9xPa1aw6 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-jTyHbwTV9xPa1aw6 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-jTyHbwTV9xPa1aw6 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-jTyHbwTV9xPa1aw6 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-jTyHbwTV9xPa1aw6 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-jTyHbwTV9xPa1aw6 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-jTyHbwTV9xPa1aw6 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-jTyHbwTV9xPa1aw6 .marker.cross{stroke:#333333;}#mermaid-svg-jTyHbwTV9xPa1aw6 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-jTyHbwTV9xPa1aw6 p{margin:0;}#mermaid-svg-jTyHbwTV9xPa1aw6 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-jTyHbwTV9xPa1aw6 .cluster-label text{fill:#333;}#mermaid-svg-jTyHbwTV9xPa1aw6 .cluster-label span{color:#333;}#mermaid-svg-jTyHbwTV9xPa1aw6 .cluster-label span p{background-color:transparent;}#mermaid-svg-jTyHbwTV9xPa1aw6 .label text,#mermaid-svg-jTyHbwTV9xPa1aw6 span{fill:#333;color:#333;}#mermaid-svg-jTyHbwTV9xPa1aw6 .node rect,#mermaid-svg-jTyHbwTV9xPa1aw6 .node circle,#mermaid-svg-jTyHbwTV9xPa1aw6 .node ellipse,#mermaid-svg-jTyHbwTV9xPa1aw6 .node polygon,#mermaid-svg-jTyHbwTV9xPa1aw6 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-jTyHbwTV9xPa1aw6 .rough-node .label text,#mermaid-svg-jTyHbwTV9xPa1aw6 .node .label text,#mermaid-svg-jTyHbwTV9xPa1aw6 .image-shape .label,#mermaid-svg-jTyHbwTV9xPa1aw6 .icon-shape .label{text-anchor:middle;}#mermaid-svg-jTyHbwTV9xPa1aw6 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-jTyHbwTV9xPa1aw6 .rough-node .label,#mermaid-svg-jTyHbwTV9xPa1aw6 .node .label,#mermaid-svg-jTyHbwTV9xPa1aw6 .image-shape .label,#mermaid-svg-jTyHbwTV9xPa1aw6 .icon-shape .label{text-align:center;}#mermaid-svg-jTyHbwTV9xPa1aw6 .node.clickable{cursor:pointer;}#mermaid-svg-jTyHbwTV9xPa1aw6 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-jTyHbwTV9xPa1aw6 .arrowheadPath{fill:#333333;}#mermaid-svg-jTyHbwTV9xPa1aw6 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-jTyHbwTV9xPa1aw6 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-jTyHbwTV9xPa1aw6 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-jTyHbwTV9xPa1aw6 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-jTyHbwTV9xPa1aw6 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-jTyHbwTV9xPa1aw6 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-jTyHbwTV9xPa1aw6 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-jTyHbwTV9xPa1aw6 .cluster text{fill:#333;}#mermaid-svg-jTyHbwTV9xPa1aw6 .cluster span{color:#333;}#mermaid-svg-jTyHbwTV9xPa1aw6 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-jTyHbwTV9xPa1aw6 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-jTyHbwTV9xPa1aw6 rect.text{fill:none;stroke-width:0;}#mermaid-svg-jTyHbwTV9xPa1aw6 .icon-shape,#mermaid-svg-jTyHbwTV9xPa1aw6 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-jTyHbwTV9xPa1aw6 .icon-shape p,#mermaid-svg-jTyHbwTV9xPa1aw6 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-jTyHbwTV9xPa1aw6 .icon-shape .label rect,#mermaid-svg-jTyHbwTV9xPa1aw6 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-jTyHbwTV9xPa1aw6 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-jTyHbwTV9xPa1aw6 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-jTyHbwTV9xPa1aw6 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} App 进程
Binder 调 AMS
(比如 startActivity)
AMS 主线程
收到 Binder 请求
AMS 在处理中
去调了 HAL 层
HAL 层卡住了
AMS 主线程被阻塞
Binder 线程池也满了
所有 Binder 请求
全部卡在队列里
Watchdog 检测到
Binder 和主线程
都无响应 → SWT
场景三:IO 操作卡死主线程
在 AMS / WMS 的主线程里同步读文件、写数据库------如果存储芯片坏了或者文件系统卡住了,主线程直接堵死。
java
// ❌ 千万不要在 AMS / WMS 的主线程里同步读文件
// 如果 eMMC/UFS 出问题,这行代码能卡一分钟
byte[] data = Files.readAllBytes(Paths.get("/data/system/some_file.xml"));
场景四:内存压力导致 GC 时间过长
SystemServer 堆严重不足,出现 Full GC,一次 GC 卡 30 秒------并且是在持锁的状态下。Watchdog 看到的就是主线程不响应。
六、实战:SWT 日志分析
SWT 发生后,抓这些日志:
bash
# 1. 获取 SWT 时刻的 dump 文件
adb pull /data/anr/ .
# 2. 看 SWT 发生的精确时间
adb logcat -b all -d | grep "WATCHDOG KILLING"
# 3. 看它 dump 的线程堆栈
adb logcat -b all -d | grep -A 50 "Blocked in monitor"
# 4. 看完整的 ANR trace(SWT 时也会生成一份)
ls -la /data/anr/anr_*
真实 SWT 日志分析示例:
下面是一份实际的 SWT 日志(脱敏后):
W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (fg)
W Watchdog: foreground thread stack trace:
W Watchdog: at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:14562)
W Watchdog: at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:248)
W Watchdog: - waiting to lock <0x0b2f5c3a> (a com.android.server.am.ActivityManagerService)
W Watchdog: held by thread 16
W Watchdog:
W Watchdog: main thread stack trace:
W Watchdog: at com.android.server.wm.WindowManagerService.removeWindow(WindowManagerService.java:3421)
W Watchdog: at com.android.server.am.ActivityStack.removeActivityFromHistoryLocked(ActivityStack.java:4567)
W Watchdog: at com.android.server.am.ActivityManagerService.activityDestroyed(ActivityManagerService.java:5678)
W Watchdog: - waiting to lock <0x0b2f5c3a> (a com.android.server.am.ActivityManagerService)
W Watchdog: held by thread 16
W Watchdog:
W Watchdog: thread 16 stack trace:
W Watchdog: at com.android.server.am.ActivityManagerService.killAllBackgroundProcesses(ActivityManagerService.java:8901)
W Watchdog: at android.database.ContentObserver.dispatchChange(ContentObserver.java:235)
W Watchdog: - locked <0x0b2f5c3a> (a com.android.server.am.ActivityManagerService)
W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:102)
分析思路:
- 先看
Blocked in monitor xxx------ AMS 的 foreground thread 超时了 - 看
waiting to lock后面的对象地址0x0b2f5c3a------ 锁是 AMS 对象 - 看
held by thread 16------ 锁被 thread 16 持着 - 翻到 thread 16 的堆栈 ------ 它在
killAllBackgroundProcesses这个方法里持锁干了什么? - 问题定位:thread 16 在 AMS 里做
killAllBackgroundProcesses,持着 AMS 的锁去调了ContentObserver.dispatchChange,而这个操作可能又等别的同步调用------锁一直没放,主线程和 fg 线程都在等这个锁 → 60s 超时 → SWT
七、实战:复现和定位 SWT 问题
步骤一:加大日志输出
bash
# 把 Watchdog 的日志级别调低,输出更多细节
adb shell dumpsys watchdog
# 看当前哪些线程被监控、它们最后一次心跳的时间
步骤二:手动触发 Watchdog dump
bash
# 如果怀疑卡死了,手动让 Watchdog dump 当前所有线程堆栈
adb shell kill -3 <system_server_pid>
# 或者
adb shell am dumpstack
# dump 文件会写到 /data/anr/
步骤三:用 systrace 抓时间线
bash
# SWT 问题通常要抓很长的时间(90s+)
python systrace.py -t 120 -a system_server -b 32768 sched freq idle am wm gfx view binder_driver
# -t 120:抓 120 秒,覆盖 Watchdog 的一次完整检测周期
systrace 里看 system_server 的主线程------如果连续 60 秒在同一个函数卡着不动,那个函数就是 SWT 的真凶。
步骤四:复现后用 bugreport 分析
bash
# 最好在 SWT 发生后立刻抓 bugreport
adb bugreport bugreport_swt.zip
# 解压后看这三个文件:
# 1. bugreport_xxx.txt → 搜 "WATCHDOG"
# 2. FS/data/anr/anr_xxx → 线程堆栈
# 3. main_entry.txt → 系统日志
步骤五:缩小复现场景
SWT 通常不是必现的------它是特定时序下的死锁。尝试在疑似卡死的操作前后打 log,增大调用密度:
bash
# 比如怀疑是某个 Binder 调用导致的------反复调它
adb shell "while true; do am start -n com.example.app/.SomeActivity; sleep 0.5; done"
# 看能不能把 SWT 逼出来
八、实战:SWT 问题修复
修复一:拆锁------减小锁粒度
java
// ❌ 持一个大锁做所有事
public class SomeService {
private final Object mLock = new Object();
public void methodA() {
synchronized (mLock) {
doSlowIo(); // IO 操作
updateDatabase(); // 数据库操作
notifyObservers(); // 通知回调------回调里可能再调回来
}
}
}
// ✓ 拆成更小的锁
public class SomeService {
private final Object mDataLock = new Object();
private final Object mCallbackLock = new Object();
public void methodA() {
synchronized (mDataLock) {
updateDatabase();
}
// 回调在数据锁外面执行------避免回调里死锁
synchronized (mCallbackLock) {
notifyObservers();
}
}
}
修复二:Binder 调用加超时
java
// ❌ 同步 Binder 调用,没超时
IBinder service = ServiceManager.getService("xxx");
// 如果对端卡死,这里永远不返回 → SWT
// ✓ 带上超时的调用
try {
// 方式 1:Future + timeout
Future<Boolean> future = executor.submit(() -> {
return binderService.doSomeWork();
});
Boolean result = future.get(30, TimeUnit.SECONDS); // 30 秒超时
} catch (TimeoutException e) {
// 超时处理------不要无限等
Log.w(TAG, "Binder call timeout", e);
} catch (Exception e) {
Log.e(TAG, "Binder call failed", e);
}
修复三:持锁的线程加 Watchdog 心跳
如果你确实需要持一个锁很长时间(比如硬件操作),可以在持锁的过程中定期对 Watchdog 说"我还活着":
java
// 通知 Watchdog 该线程正在忙,不要被误判为卡死
Watchdog.getInstance().addThread(
handler,
timeoutMillis,
"MySlowThread"
);
但要谨慎------这只能用在"我知道会慢但必须做"的场景,不能用来掩盖真正的死锁。
修复四:检查调用链中是否有环路
java
// A 调 B,B 调 C,C 又调回 A ------ 死循环
// A.method() → B.method() → C.method() → A.method() → ...
// ✓ 加一个计数器或标记防止重入
private boolean isInMethod = false;
public void method() {
if (isInMethod) {
Log.w(TAG, "Re-entrant call detected, skipping");
return;
}
isInMethod = true;
try {
doWork();
} finally {
isInMethod = false;
}
}
九、Watchdog 的源码走读
Watchdog 创建时机(SystemServer 阶段):
java
// frameworks/base/services/java/com/android/server/SystemServer.java
private void startBootstrapServices() {
// ...前面的服务
// ★ 创建 Watchdog 单例
final Watchdog watchdog = Watchdog.getInstance();
watchdog.start();
}
Watchdog 核心 run 方法(简化版):
java
// frameworks/base/services/core/java/com/android/server/Watchdog.java
public void run() {
boolean waitedHalf = false;
while (true) {
synchronized (this) {
// 1. 每隔 30 秒检查一轮
wait(CHECK_INTERVAL); // CHECK_INTERVAL = 30 * 1000
// 2. 给所有被监控的 Handler 发空消息
for (HandlerChecker hc : mHandlerCheckers) {
hc.scheduleCheckLocked();
}
// 3. 等 30 秒(第一次不触发,给个机会)
long timeout = waitedHalf ? DEFAULT_TIMEOUT / 2 : DEFAULT_TIMEOUT;
wait(timeout); // 第一次 60s,后续 30s
// 4. 检查哪些还没处理完
ArrayList<HandlerChecker> blockedCheckers = getBlockedCheckersLocked();
if (blockedCheckers.size() > 0) {
if (!waitedHalf) {
// 第一次超时------再给 30 秒
waitedHalf = true;
continue;
}
// ★ 第二次超时------真的卡死了
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: "
+ blockedCheckers.get(0).getName());
// dump 所有线程堆栈
ActivityManagerService.dumpStackTraces(...);
// 杀 SystemServer
Process.killProcess(Process.myPid());
System.exit(10);
}
waitedHalf = false;
}
}
}
逻辑很清楚:每 30s 发一轮心跳 → 第一次超时再给 30s → 第二次超时直接杀。
十、常见踩坑记录
坑 1:把 SWT 和 ANR 搞混
| SWT | ANR | |
|---|---|---|
| 对象 | SystemServer 的内部线程 | 某个 App 进程的主线程 |
| 超时 | 60s | 5s(输入)/ 10s(广播)/ 20s(Service) |
| 后果 | 手机重启 | 弹 ANR 对话框(或不弹) |
| 日志关键词 | WATCHDOG KILLING SYSTEM PROCESS |
ANR in ... |
坑 2:只看 SWT 日志里 dump 的线程,忽略了前后上下文
SWT dump 出来的是"卡死那一刻"的堆栈。但不一定是"真正导致卡死的那一刻"的堆栈。结合前 30 秒的日志一起看,才能找到起因。
bash
# 看 SWT 发生前 60 秒的日志
adb logcat -b all -d -t "01-01 11:59:00.000" | grep -E "Watchdog|ActivityManager|WindowManager"
坑 3:以为是某个服务卡了,其实是 HAL 层死了
AMS 主线程卡住,看起来是 AMS 的问题。往前翻可能是 AMS 调了 SensorService,SensorService 调了 HAL,HAL 层跟硬件通信时卡死了------根因在 HAL。
bash
# 看 kernel log,确认是不是硬件驱动层的问题
adb shell dmesg | tail -200
坑 4:系统重启太快,没抓到 dump
SWT 触发后系统不到一秒就重启了。很多时候 /data/anr/ 里的 dump 都没写完。
bash
# 一种折中办法------把 Watchdog 的超时调长,给自己留更多抓日志的时间
adb shell settings put global activity_manager_constants watchdog_timeout_millis=120000
# 或者在板子上改 Watchdog 源码的 DEFAULT_TIMEOUT
坑 5:加了 Watchdog 心跳但线程其实在空转
java
// ❌ 这种"心跳"骗不了 Watchdog
// Watchdog 是给被监控线程发消息看能不能处理------不是看你有没有更新一个 flag
Watchdog.getInstance().addThread(handler, timeout, "MyThread");
// handler 的主线程消息队列堵了,handler 收不到消息,心跳还是丢的
十一、总结
SWT 触发链路一张图:
#mermaid-svg-p81NHtbJmudfxFk2{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-p81NHtbJmudfxFk2 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-p81NHtbJmudfxFk2 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-p81NHtbJmudfxFk2 .error-icon{fill:#552222;}#mermaid-svg-p81NHtbJmudfxFk2 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-p81NHtbJmudfxFk2 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-p81NHtbJmudfxFk2 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-p81NHtbJmudfxFk2 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-p81NHtbJmudfxFk2 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-p81NHtbJmudfxFk2 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-p81NHtbJmudfxFk2 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-p81NHtbJmudfxFk2 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-p81NHtbJmudfxFk2 .marker.cross{stroke:#333333;}#mermaid-svg-p81NHtbJmudfxFk2 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-p81NHtbJmudfxFk2 p{margin:0;}#mermaid-svg-p81NHtbJmudfxFk2 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-p81NHtbJmudfxFk2 .cluster-label text{fill:#333;}#mermaid-svg-p81NHtbJmudfxFk2 .cluster-label span{color:#333;}#mermaid-svg-p81NHtbJmudfxFk2 .cluster-label span p{background-color:transparent;}#mermaid-svg-p81NHtbJmudfxFk2 .label text,#mermaid-svg-p81NHtbJmudfxFk2 span{fill:#333;color:#333;}#mermaid-svg-p81NHtbJmudfxFk2 .node rect,#mermaid-svg-p81NHtbJmudfxFk2 .node circle,#mermaid-svg-p81NHtbJmudfxFk2 .node ellipse,#mermaid-svg-p81NHtbJmudfxFk2 .node polygon,#mermaid-svg-p81NHtbJmudfxFk2 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-p81NHtbJmudfxFk2 .rough-node .label text,#mermaid-svg-p81NHtbJmudfxFk2 .node .label text,#mermaid-svg-p81NHtbJmudfxFk2 .image-shape .label,#mermaid-svg-p81NHtbJmudfxFk2 .icon-shape .label{text-anchor:middle;}#mermaid-svg-p81NHtbJmudfxFk2 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-p81NHtbJmudfxFk2 .rough-node .label,#mermaid-svg-p81NHtbJmudfxFk2 .node .label,#mermaid-svg-p81NHtbJmudfxFk2 .image-shape .label,#mermaid-svg-p81NHtbJmudfxFk2 .icon-shape .label{text-align:center;}#mermaid-svg-p81NHtbJmudfxFk2 .node.clickable{cursor:pointer;}#mermaid-svg-p81NHtbJmudfxFk2 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-p81NHtbJmudfxFk2 .arrowheadPath{fill:#333333;}#mermaid-svg-p81NHtbJmudfxFk2 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-p81NHtbJmudfxFk2 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-p81NHtbJmudfxFk2 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-p81NHtbJmudfxFk2 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-p81NHtbJmudfxFk2 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-p81NHtbJmudfxFk2 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-p81NHtbJmudfxFk2 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-p81NHtbJmudfxFk2 .cluster text{fill:#333;}#mermaid-svg-p81NHtbJmudfxFk2 .cluster span{color:#333;}#mermaid-svg-p81NHtbJmudfxFk2 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-p81NHtbJmudfxFk2 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-p81NHtbJmudfxFk2 rect.text{fill:none;stroke-width:0;}#mermaid-svg-p81NHtbJmudfxFk2 .icon-shape,#mermaid-svg-p81NHtbJmudfxFk2 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-p81NHtbJmudfxFk2 .icon-shape p,#mermaid-svg-p81NHtbJmudfxFk2 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-p81NHtbJmudfxFk2 .icon-shape .label rect,#mermaid-svg-p81NHtbJmudfxFk2 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-p81NHtbJmudfxFk2 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-p81NHtbJmudfxFk2 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-p81NHtbJmudfxFk2 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 某线程持锁 > 60s
Watchdog HandlerChecker
发到该线程的消息
60s 内没被处理
第二次检查仍超时
(共 75-90s)
dump 所有线程堆栈
→ /data/anr/anr_xxx
Process.killProcess
杀 SystemServer
init 检测到 SystemServer 挂了
按 rc 配置重启
★ 用户看到手机重启
SWT 分析速查表:
| 你要找的 | 去哪看 |
|---|---|
| SWT 发生时间 | `adb logcat |
| 是哪个 Monitor 卡了 | 搜 "Blocked in monitor" |
| 锁被谁持着 | 搜 "held by thread" |
| 持锁的线程在干什么 | 翻到对应 thread 的堆栈 |
| 卡死前发生了什么 | 往前翻 60s 的日志 |
| 完整线程堆栈 | /data/anr/anr_xxx |
| 是不是硬件问题 | adb shell dmesg |
核心源码位置:
| 文件 | 路径 |
|---|---|
| Watchdog.java | frameworks/base/services/core/java/com/android/server/Watchdog.java |
| SystemServer(创建 Watchdog) | frameworks/base/services/java/com/android/server/SystemServer.java |
一句话: SWT 就是 SystemServer 里某条线程持锁超过 60s 不撒手,Watchdog 发现后把整个 SystemServer 杀了导致手机重启。排查路径:找到 "Blocked in monitor" → 找到 "held by thread" → 看那个线程在干嘛 → 顺着锁链往前追根因。