Memory corruption
对于系统中出现随机、不可解释的异常指针访问或数据错误导致的异常,一般要考虑是内存使用上出现了 UAF(Use-After-Free),OOB(Out-of-Bounds)。
本章所指的"Memory corruption"特指 Linux kernel 侧出现的"Memory corruption",子系统间的内存踩踏请参考 Firewall 。
通用方法
当怀疑系统有 OOB、UAF 类问题时,打开 CONFIG_KASAN 开关,进行复现。
当出现 memory corruption 问题时,系统默认会 BUG_ON。
检查 panic log 信息,对于 slub、stack、buddy page、全局变量的 UAF 和 OOB 均有关键信息指出,基本通过 log 能够解决所有问题。
典型问题
6.262525\] BUG: KASAN: global-out-of-bounds in __of_match_node+0x70/0xb8 \[ 6.263391\] Read of size 1 at addr ffffff9008d153a8 by task swapper/0/1 \[ 6.264231
6.264439\] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 6.1.94-rt33-gac7c113a9bab #2 \[ 6.265488\] Hardware name: Horizon Robotics 征程 6E Evaluation Module Board (DT) \[ 6.266362\] Call trace: \[ 6.266694\] \[\] dump_backtrace+0x0/0x538 \[ 6.267391\] \[\] show_stack+0x14/0x20 \[ 6.268048\] \[\] dump_stack+0xa4/0xc8 \[ 6.268703\] \[\] print_address_description+0x1e4/0x250 \[ 6.269539\] \[\] kasan_report+0x2cc/0x300 \[ 6.270240\] \[\] __asan_load1+0x44/0x50 \[ 6.270912\] \[\] __of_match_node+0x70/0xb8 \[ 6.271617\] \[\] of_match_node+0x38/0x60 \[ 6.272301\] \[\] of_match_device+0x3c/0x50 \[ 6.273012\] \[\] platform_match+0x64/0x118 \[ 6.273719\] \[\] __driver_attach+0x40/0x140 \[ 6.274435\] \[\] bus_for_each_dev+0xcc/0x140 \[ 6.275164\] \[\] driver_attach+0x30/0x40 \[ 6.275848\] \[\] bus_add_driver+0x220/0x388 \[ 6.276566\] \[\] driver_register+0x108/0x170 \[ 6.277295\] \[\] __platform_driver_register+0x7c/0x88 \[ 6.278122\] \[\] 征程 6_wdt_driver_init+0x34/0x4c \[ 6.278861\] \[\] do_one_initcall+0xe4/0x1b8 \[ 6.279581\] \[\] kernel_init_freeable+0x1ac/0x260 \[ 6.280363\] \[\] kernel_init+0x10/0x118 \[ 6.281036\] \[\] ret_from_fork+0x10/0x18 \[ 6.281713
6.281911\] The buggy address belongs to the variable: \[ 6.282570\] 0xffffff9008d153a8 \[ 6.282974
6.283173\] Memory state around the buggy address: \[ 6.283794\] ffffff9008d15280: fa fa fa fa 07 fa fa fa fa fa fa fa 00 00 00 00 \[ 6.284715\] ffffff9008d15300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \[ 6.285636\] \>ffffff9008d15380: 00 00 00 00 00 fa fa fa fa fa fa fa 00 00 00 00 \[ 6.286552\] \^ \[ 6.287137\] ffffff9008d15400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \[ 6.288057\] ffffff9008d15480: 00 00 00 fa fa fa fa fa 00 00 fa fa fa fa fa fa \[ 6.288974\] ================================================================== \[ 6.289890\] Disabling lock debugging due to kernel taint 错误类型 global-out-of-bounds,全局变量越界访问,越界读取访问一个字节。 检查 Calltrace 是在驱动 probe 匹配 device、driver 的过程中。 复杂问题可能需要检查 trace 后面 buggy address(并非实际数据地址,而是在 shadow 区的映射)信息综合分析,此问题可以看到要访问地址 ffffff9008d153a8 里数据为 0xFA,0xFA 代表全局变量的 redzone(越界检测)。 检查逻辑__of_match_node 过程是循环遍历 of_match_table 中所有的项,直到表项中成员为空退出循环。 const struct of_device_id \*__of_match_node(const struct of_device_id \*matches, const struct device_node \*node) { const struct of_device_id \*best_match = NULL; int score, best_score = 0; if (!matches) return NULL; for (; matches-\>name\[0\] \|matches-\>type\[0\] \|matches-\>compatible\[0\]; matches++) { score = __of_device_is_compatible(node, matches-\>compatible, matches-\>type, matches-\>name); if (score \> best_score) { best_match = matches; best_score = score; } } return best_match; } 根据代码可知,越界原因是 of_match_table 变量尾部没有填充 0。 #ifdef CONFIG_OF static const struct of_device_id 征程 6_wdt_of_match\[\] = { * { .compatible = "snps,征程 6_wdt", } * /\* sentinel \*/ * { .compatible = "snps,征程 6_wdt", }, * {/\* sentinel */} /* PRQA S 1041 \*/** };**MODULE_D