代码段的消失:页表异常清零引发的 ILL_ILLOPC 溯源

背景

在一次随机测试当出现了奇怪的现象,所有的程序调用同一个库文件的函数时都出现 ILL_ILLOPC 指令异常,一看 PC 地址附近内存全是零的现象,然而文件是系统的只读分区中,原文件并没有发生损坏,文件内容却丢失了部分。恰好这一次极低概率问题触发了某个场景保存了 Ramdump 文件。

错误特征

yaml 复制代码
Cmdline: /vendor/bin/hw/vendor.qti.hardware.display.composer-service
pid: 1967, tid: 1967, name: binder:1967_2  >>> /vendor/bin/hw/vendor.qti.hardware.display.composer-service <<<
uid: 1000
tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
signal 4 (SIGILL), code 1 (ILL_ILLOPC), fault addr 0x000000772cb25cf4
    x0  0000000000000000  x1  0000007fee454720  x2  0000005dc0bc044c  x3  0000007fee454670
    x4  0000007fee454664  x5  0000000000000001  x6  000000000000003f  x7  0000000000000000
    x8  0000000000000000  x9  0000000000000001  x10 0000000000000044  x11 0000000000000003
    x12 0000000000000028  x13 0000000000000000  x14 b40000748b0788d0  x15 0000000000000000
    x16 00000074864df850  x17 000000772cb25cf4  x18 000000772d22c000  x19 0000007fee454720
    x20 b40000760b010910  x21 0000005dc0bc044c  x22 0000007fee454720  x23 000000772d152f00
    x24 0000007fee454668  x25 00000000fffffc0c  x26 0000000000000000  x27 0000000000000000
    x28 0000007fee454720  x29 0000007fee4545e0
    lr  00000074864d3380  sp  0000007fee454590  pc  000000772cb25cf4  pst 0000000060001400

28 total frames
backtrace:
      #00 pc 0000000000023cf4  /apex/com.android.runtime/lib64/bionic/libm.so 
      ...
erlang 复制代码
memory near pc (/apex/com.android.runtime/lib64/bionic/libm.so):
    000000772cb25cd0 0000000000000000 0000000000000000  ................
    000000772cb25ce0 0000000000000000 0000000000000000  ................
    000000772cb25cf0 0000000000000000 0000000000000000  ................
    000000772cb25d00 0000000000000000 0000000000000000  ................
    000000772cb25d10 0000000000000000 0000000000000000  ................
    000000772cb25d20 0000000000000000 0000000000000000  ................
    000000772cb25d30 0000000000000000 0000000000000000  ................
    000000772cb25d40 0000000000000000 0000000000000000  ................
    000000772cb25d50 0000000000000000 0000000000000000  ................
    000000772cb25d60 0000000000000000 0000000000000000  ................
    000000772cb25d70 0000000000000000 0000000000000000  ................
    000000772cb25d80 0000000000000000 0000000000000000  ................
    000000772cb25d90 0000000000000000 0000000000000000  ................
    000000772cb25da0 0000000000000000 0000000000000000  ................
    000000772cb25db0 0000000000000000 0000000000000000  ................
    000000772cb25dc0 0000000000000000 0000000000000000  ................

Ramdump 分析

yaml 复制代码
crash-android> ps | grep logd
     1082       1   2  ffffff88109b8000  IN   0.2 11208532    72896  logd
     1108       1   2  ffffff88129a0000  IN   0.2 11208532    72896  logd.reader
     1109       1   0  ffffff88129a1640  IN   0.2 11208532    72896  logd.writer
     1110       1   0  ffffff881556ac80  IN   0.2 11208532    72896  logd.control
     1124       1   0  ffffff8812b4ac80  IN   0.2 11208532    72896  logd.klogd
     1125       1   0  ffffff8812b4d900  IN   0.2 11208532    72896  logd.auditd
css 复制代码
crash-android> lp core -p 1082 --zram
Saved [1082.core].
yaml 复制代码
core-parser -c 1082.core

core-parser> logcat -b crash
...
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG: pid: 8754, tid: 8760, name: surfaceflinger  >>> /system/bin/surfaceflinger <<<
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG: uid: 1000
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG: tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG: pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG: signal 4 (SIGILL), code 1 (ILL_ILLOPC), fault addr 0x0000007d9ff25dc0
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG:     x0  0000000000000001  x1  0000000000000001  x2  0000000000000000  x3  0000000000000000
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG:     x4  b400007c856301d0  x5  0000007da18e9580  x6  b400007c15623fd0  x7  b400007c15623fd0
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG:     x8  0000007da18e9580  x9  0000007da13f4304  x10 0000007da11c66c8  x11 0000007da18e9054
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG:     x12 00000000000002b7  x13 0000000000000000  x14 b400007cf56122fd  x15 0000000000000096
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG:     x16 0000007da1a464b0  x17 0000007d9ff25dc0  x18 0000007ae4488000  x19 0000000000000001
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG:     x20 0000000000000001  x21 3fd0c15240000000  x22 0000007ae4ff6ad0  x23 0000000000000000
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG:     x24 b400007c15623fd0  x25 0000000000000000  x26 0000000000000000  x27 0000000000000000
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG:     x28 0000000000000000  x29 0000007ae4ff6b80
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG:     lr  0000007da18e9abc  sp  0000007ae4ff6aa0  pc  0000007d9ff25dc0  pst 0000000060001400
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG: 32 total frames
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG: backtrace:
2026-02-09 02:28:17.422   1000  8765  8765 F DEBUG:       #00 pc 0000000000024dc0  /apex/com.android.runtime/lib64/bionic/libm.so (tan+0) (BuildId: a985a539ac1f4bfe3de003f47a1575ed)
...
2026-02-09 02:28:17.863   1000  8804  8804 F DEBUG: Cmdline: /vendor/bin/hw/vendor.qti.hardware.display.composer-service
2026-02-09 02:28:17.863   1000  8804  8804 F DEBUG: pid: 8769, tid: 8769, name: vendor.qti.hard  >>> /vendor/bin/hw/vendor.qti.hardware.display.composer-service <<<
2026-02-09 02:28:17.863   1000  8804  8804 F DEBUG: uid: 1000
2026-02-09 02:28:17.863   1000  8804  8804 F DEBUG: tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
2026-02-09 02:28:17.863   1000  8804  8804 F DEBUG: pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG: signal 4 (SIGILL), code 1 (ILL_ILLOPC), fault addr 0x000000784fa24cf4
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG:     x0  0000000000000000  x1  0000000000000000  x2  ffffffffffffffc0  x3  0000000000000010
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG:     x4  0000000000000000  x5  0000000000000040  x6  000000000000003f  x7  0000000000000000
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG:     x8  00000000fffffffb  x9  0000000000000ad0  x10 0000000000000001  x11 0000000000000a30
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG:     x12 0000000000000000  x13 0000000000000000  x14 0000000000000000  x15 7d0000003c8c0000
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG:     x16 00000075b39f9850  x17 000000784fa24cf4  x18 0000007854880000  x19 0000000000000001
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG:     x20 0000000000000000  x21 0000000000000000  x22 0000007fdd1647e0  x23 b40000778400a510
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG:     x24 b400007644008eb0  x25 00000000000000a4  x26 b400007724008ed0  x27 0000000000000000
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG:     x28 b40000778400af74  x29 0000007fdd164590
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG:     lr  00000075b39ecf20  sp  0000007fdd164560  pc  000000784fa24cf4  pst 0000000080001400
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG: 21 total frames
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG: backtrace:
2026-02-09 02:28:17.864   1000  8804  8804 F DEBUG:       #00 pc 0000000000023cf4  /apex/com.android.runtime/lib64/bionic/libm.so (scalbnf+0) (BuildId: a985a539ac1f4bfe3de003f47a1575ed)
...
2026-02-09 02:28:18.037   1046  8733  8735 F DEBUG: Cmdline: media.swcodec oid.media.swcodec/bin/mediaswcodec
2026-02-09 02:28:18.037   1046  8733  8735 F DEBUG: pid: 8733, tid: 8735, name: binder:8733_3  >>> media.swcodec <<<
2026-02-09 02:28:18.037   1046  8733  8735 F DEBUG: uid: 1046
2026-02-09 02:28:18.037   1046  8733  8735 F DEBUG: tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
2026-02-09 02:28:18.037   1046  8733  8735 F DEBUG: pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
2026-02-09 02:28:18.037   1046  8733  8735 F DEBUG: signal 4 (SIGILL), code 1 (ILL_ILLOPC), fault addr --------
2026-02-09 02:28:18.037   1046  8733  8735 F DEBUG:     x0  9117dd6f103851fa  x1  0000007e81ce0738  x2  0000000000000000  x3  0000000000000030
2026-02-09 02:28:18.037   1046  8733  8735 F DEBUG:     x4  0000000000000000  x5  b400007d70e0419c  x6  d20020000000000c  x7  0000000000000000
2026-02-09 02:28:18.037   1046  8733  8735 F DEBUG:     x8  000000007f7fffff  x9  0000007e81cc0000  x10 0000000000000004  x11 0200007c50e11eb0
2026-02-09 02:28:18.038   1046  8733  8735 F DEBUG:     x12 0000000000040004  x13 657461722d656d61  x14 0a1b350e0000102c  x15 0000000000000000
2026-02-09 02:28:18.038   1046  8733  8735 F DEBUG:     x16 0000007bcc8733d8  x17 0000007e7a923498  x18 0000007bd25b4000  x19 b400007be0e0a838
2026-02-09 02:28:18.038   1046  8733  8735 F DEBUG:     x20 214557310473bf25  x21 b400007be0e0ab68  x22 b400007c10e132e0  x23 0000007bcc8728d8
2026-02-09 02:28:18.038   1046  8733  8735 F DEBUG:     x24 9117dd6f103851fa  x25 0000000000000000  x26 0000000000000000  x27 0000007bd36f9bd0
2026-02-09 02:28:18.038   1046  8733  8735 F DEBUG:     x28 0000007bd36f9bc0  x29 0000007bd36f9e10
2026-02-09 02:28:18.038   1046  8733  8735 F DEBUG:     lr  0000007bcc858b8c  sp  0000007bd36f9ad0  pc  0000007e7a923498  pst 0000000080001400
2026-02-09 02:28:18.038   1046  8733  8735 F DEBUG: 18 total frames
2026-02-09 02:28:18.038   1046  8733  8735 F DEBUG: backtrace:
2026-02-09 02:28:18.038   1046  8733  8735 F DEBUG:       #00 pc 0000000000023498  /apex/com.android.runtime/lib64/bionic/libm.so (nextafterf+0) (BuildId: a985a539ac1f4bfe3de003f47a1575ed)
...
core-parser> 

Ramdump 中取出最后的 Android 日志,可以看到存在多个进程调用了 libm.so 的函数均出现 ILL_ILLOPC 错误。集中在文件页表 OFFSET:0x23000 和 OFFSET:0x24000 上。

偏移地址 文件路径
#00 pc 0000000000023498 /apex/com.android.runtime/lib64/bionic/libm.so (nextafterf+0)
#00 pc 0000000000024dc0 /apex/com.android.runtime/lib64/bionic/libm.so (tan+0)
#00 pc 0000000000023cf4 /apex/com.android.runtime/lib64/bionic/libm.so (scalbnf+0)

最后剩余的进程

shell 复制代码
crash-android> lp cmdline -a | grep surfaceflinger
crash-android> lp cmdline -a | grep composer
crash-android> lp cmdline -a | grep mediaswcodec
PID: 8905     media.swcodec oid.media.swcodec/bin/mediaswcodec 
crash-android>

在内存中找到这个三个反复 crash 的进程,存在一个刚拉起未运行到 crash 的位置的现场。

进程内存分析

yaml 复制代码
crash-android> set 8905
    PID: 8905
COMMAND: "binder:8905_2"
   TASK: ffffff88575e2c80  [THREAD_INFO: ffffff88575e2c80]
    CPU: 6
  STATE: TASK_INTERRUPTIBLE 
crash-android>
bash 复制代码
crash-android> vm -p
...
      VMA           START       END     FLAGS FILE
ffffff8921ff7100 7e86c80000 7e86c94000 800000000000071 /apex/com.android.runtime/lib64/bionic/libm.so
VIRTUAL     PHYSICAL
7e86c80000  9d9fe3000
7e86c81000  a07334000
7e86c82000  881549000
7e86c83000  abe5e9000
7e86c84000   f00e7000
7e86c85000  ad3ed2000
7e86c86000  9309c5000
7e86c87000  89532e000
7e86c88000  895337000
7e86c89000  9a5135000
7e86c8a000  a93e59000
7e86c8b000  ad1293000
7e86c8c000  ab88a8000
7e86c8d000   d3612000
7e86c8e000  a3d844000
7e86c8f000  a53bdf000
7e86c90000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 10000
7e86c91000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 11000
7e86c92000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 12000
7e86c93000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 13000
      VMA           START       END     FLAGS FILE
ffffff8921ff7500 7e86c94000 7e86cb8000 1000075 /apex/com.android.runtime/lib64/bionic/libm.so
VIRTUAL     PHYSICAL
7e86c94000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 14000
7e86c95000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 15000
7e86c96000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 16000
7e86c97000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 17000
7e86c98000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 18000
7e86c99000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 19000
7e86c9a000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 1a000
7e86c9b000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 1b000
7e86c9c000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 1c000
7e86c9d000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 1d000
7e86c9e000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 1e000
7e86c9f000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 1f000
7e86ca0000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 20000
7e86ca1000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 21000
7e86ca2000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 22000
7e86ca3000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 23000
7e86ca4000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 24000
7e86ca5000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 25000
7e86ca6000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 26000
7e86ca7000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 27000
7e86ca8000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 28000
7e86ca9000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 29000
7e86caa000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 2a000
7e86cab000  FILE: /apex/com.android.runtime/lib64/bionic/libm.so  OFFSET: 2b000
...

从进程当下的状态看,程序运行到数学库的函数触发缺页中断,加载过来的页表内容异常为 0 才会出现问题。

页内存检验

css 复制代码
crash-android> lp core -p 8905 --zram
Saved [8905.core].
arduino 复制代码
core-parser -c 8905.core

// 未进行 sysroot 加载原文件的页表到 core-parser,因此当前是 ramdump 中提取的原始内存
core-parser> map | grep libm.so
 64 0x7ebcfa1910  [7e86c80000, 7e86c94000)  r--   7e86c80000  /apex/com.android.runtime/lib64/bionic/libm.so [*]
r 复制代码
core-parser> map -s 64
VADDR             SIZE              INFO              NAME
0000007e86ca603c  000000000000000c  0000000000000012  ceil
0000007e86ca630c  0000000000000018  0000000000000012  fetestexcept
0000007e86cb2ccc  0000000000000310  0000000000000012  expl
0000007e86ca5c4c  0000000000000098  0000000000000012  cexpl
0000007e86c9fc6c  000000000000008c  0000000000000012  cprojf
0000007e86ca60cc  000000000000000c  0000000000000012  floor
0000007e86c9e218  0000000000000280  0000000000000012  ccosh
0000007e86ca6098  000000000000000c  0000000000000012  fabs
0000007e86ca3278  000000000000007c  0000000000000012  nearbyintf
0000007e86cb4808  0000000000000238  0000000000000012  cosf
0000007e86cac808  0000000000000114  0000000000000012  roundl
0000007e86ca8600  000000000000026c  0000000000000012  fmodl
0000007e86c87dbc  0000000000000008  0000000000000011  __fe_dfl_env
0000007e86ca616c  000000000000000c  0000000000000012  lround
0000007e86ca60d8  000000000000000c  0000000000000012  floorf
0000007e86c9e754  000000000000004c  0000000000000012  ccosf
0000007e86ca7e98  00000000000002d8  0000000000000012  asinl
...

core-parser 能够正常从先有的 ramdump 中解析 dynamic 的符号信息,说明除了代码段内存,目前进程已加载的现有内存页基本是正确的。

bash 复制代码
core-parser> env core --load | grep libm.so
  287   [7e86c80000, 7e86c94000)  r--  0000014000  0000014000  /apex/com.android.runtime/lib64/bionic/libm.so [*]
  288   [7e86c94000, 7e86cb8000)  r-x  0000024000  0000024000  /apex/com.android.runtime/lib64/bionic/libm.so [*]
  289   [7e86cb8000, 7e86cb9000)  r--  0000001000  0000001000  /apex/com.android.runtime/lib64/bionic/libm.so [*]
  291   [7e86cbc000, 7e86cbd000)  rw-  0000001000  0000001000  /apex/com.android.runtime/lib64/bionic/libm.so [*]
python 复制代码
core-parser> rd 7e86c80000 -e 7e86c80000 -f 7e86c80000.bin
core-parser> rd 7e86c94000 -e 7e86cb8000 -f 7e86c94000.bin
core-parser> rd 7e86cb8000 -e 7e86cb9000 -f 7e86cb8000.bin
core-parser> rd 7e86cbc000 -e 7e86cbd000 -f 7e86cbc000.bin
vbnet 复制代码
readelf -l 7e86c80000.bin
readelf: Error: Reading 1856 bytes extends past end of file for section headers

Elf file type is DYN (Shared object file)
Entry point 0x0
There are 12 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000002a0 0x00000000000002a0  R      0x8
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000012834 0x0000000000012834  R      0x4000
  LOAD           0x0000000000014000 0x0000000000014000 0x0000000000014000
                 0x0000000000023ae8 0x0000000000023ae8  R E    0x4000
  LOAD           0x0000000000038000 0x0000000000038000 0x0000000000038000
                 0x00000000000002e0 0x0000000000001000  RW     0x4000
  LOAD           0x000000000003c000 0x000000000003c000 0x000000000003c000
                 0x0000000000000080 0x00000000000000a0  RW     0x4000
  DYNAMIC        0x0000000000038018 0x0000000000038018 0x0000000000038018
                 0x00000000000001c0 0x00000000000001c0  RW     0x8
readelf: Error: the dynamic segment offset + size exceeds the size of the file
  GNU_RELRO      0x0000000000038000 0x0000000000038000 0x0000000000038000
                 0x00000000000002e0 0x0000000000001000  R      0x1
  GNU_EH_FRAME   0x000000000000dbc0 0x000000000000dbc0 0x000000000000dbc0
                 0x0000000000000b04 0x0000000000000b04  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x0
  GNU_PROPERTY   0x0000000000000330 0x0000000000000330 0x0000000000000330
                 0x0000000000000020 0x0000000000000020  R      0x8
  NOTE           0x00000000000002e0 0x00000000000002e0 0x00000000000002e0
                 0x0000000000000050 0x0000000000000050  R      0x4
  NOTE           0x0000000000000330 0x0000000000000330 0x0000000000000330
                 0x0000000000000020 0x0000000000000020  R      0x8

取 7e86c80000.bin 和 7e86cb8000.bin 原文件的内容进行比较,可以确定是正确无误的,这也能说明程序为什么能正常的完成动态库的加载和链接,而错误发生在运行相关代码阶段。

yaml 复制代码
core-parser> disas nextafterf
LIB: /apex/com.android.runtime/lib64/bionic/libm.so
nextafterf: [7e86ca3498, 7e86ca3570]
  0x7e86ca3498: 00000000 | udf #0
  0x7e86ca349c: 00000000 | udf #0
  0x7e86ca34a0: 00000000 | udf #0
  0x7e86ca34a4: 00000000 | udf #0
  0x7e86ca34a8: 00000000 | udf #0
  0x7e86ca34ac: 00000000 | udf #0
  0x7e86ca34b0: 00000000 | udf #0
  0x7e86ca34b4: 00000000 | udf #0
  0x7e86ca34b8: 00000000 | udf #0
  0x7e86ca34bc: 00000000 | udf #0
  0x7e86ca34c0: 00000000 | udf #0
  0x7e86ca34c4: 00000000 | udf #0
  0x7e86ca34c8: 00000000 | udf #0
...

core-parser> disas scalbnf
LIB: /apex/com.android.runtime/lib64/bionic/libm.so
scalbnf: [7e86ca3cf4, 7e86ca3d80]
  0x7e86ca3cf4: 00000000 | udf #0
  0x7e86ca3cf8: 00000000 | udf #0
  0x7e86ca3cfc: 00000000 | udf #0
  0x7e86ca3d00: 00000000 | udf #0
  0x7e86ca3d04: 00000000 | udf #0
  0x7e86ca3d08: 00000000 | udf #0
  0x7e86ca3d0c: 00000000 | udf #0
  0x7e86ca3d10: 00000000 | udf #0
  0x7e86ca3d14: 00000000 | udf #0
  0x7e86ca3d18: 00000000 | udf #0
...

此时的程序还未加载 0x23000 的文件页表到内存中,因此问题发生在运行该代码段时,触发缺页中断,从原文件中找到对应的页表加载到内存这个阶段,由于最后一刻,还未发生调用数学库的代码,因此进程还活着,我们还能从 Ramdump 中找到。那如果进程找不到,这咋办?(方法很多)

页缓存

ini 复制代码
crash-android> struct vm_area_struct ffffff8921ff7100 -x
struct vm_area_struct {
  {
    {
      vm_start = 0x7e86c80000,
      vm_end = 0x7e86c94000
    },
    vm_freeptr = {
      v = 0x7e86c80000
    }
  },
  vm_mm = 0xffffff80278b9400,
  vm_page_prot = {
    pgprot = 0x60000000000fc3
  },
  {
    vm_flags = 0x800000000000071,
    __vm_flags = 0x800000000000071
  },
  vm_lock_seq = 0x1d73,
  anon_vma_chain = {
    next = 0xffffff8921ff7130,
    prev = 0xffffff8921ff7130
  },
  anon_vma = 0x0,
  vm_ops = 0xffffffebed8177f0 <generic_file_vm_ops>,
  vm_pgoff = 0x0,
  vm_file = 0xffffff89deb52c00,
  vm_private_data = 0x0,
...
crash-android>
ini 复制代码
crash-android> struct file 0xffffff89deb52c00
struct file {
  f_count = {
    counter = 4
  },
...
  f_mode = 688157,
  f_op = 0xffffffebed8e82c0 <erofs_file_fops>,
  f_mapping = 0xffffff8845d11d60,
  private_data = 0x0,
  f_inode = 0xffffff8845d11bd8,
  f_flags = 131072,
  f_iocb_flags = 0,
  f_cred = 0xffffff80424a9780,
  f_path = {
    mnt = 0xffffff880f8c6920,
    dentry = 0xffffff88366e3380
  },
...
ini 复制代码
crash-android> struct inode 0xffffff8845d11bd8
struct inode {
  i_mode = 33188,
  i_opflags = 13,
  i_uid = {
    val = 1000
  },
  i_gid = {
    val = 1000
  },
  i_flags = 0,
  i_acl = 0x0,
  i_default_acl = 0xffffffffffffffff,
  i_op = 0xffffffebed8e7f00 <erofs_generic_iops>,
  i_sb = 0xffffff801e149000,
  i_mapping = 0xffffff8845d11d60,
  i_security = 0xffffff8818ac01c0,
  i_ino = 316544,
  {
    i_nlink = 1,
    __i_nlink = 1
  },
...
r 复制代码
crash-android> files -p 0xffffff8845d11bd8
     INODE        NRPAGES
ffffff8845d11bd8       61

      PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
fffffffee567f8c0 9d9fe3000 ffffff8845d11d60        0 11 400000000000032c referenced,uptodate,lru,active,workingset
fffffffee61ccd00 a07334000 ffffff8845d11d60        1 10 400000000000032c referenced,uptodate,lru,active,workingset
fffffffee0055240 881549000 ffffff8845d11d60        2  6 400000000000012c referenced,uptodate,lru,active
fffffffee8f97a40 abe5e9000 ffffff8845d11d60        3 14 400000000000032c referenced,uptodate,lru,active,workingset
fffffffec1c039c0  f00e7000 ffffff8845d11d60        4  6 400000000000012c referenced,uptodate,lru,active
fffffffee94fb480 ad3ed2000 ffffff8845d11d60        5 14 400000000000032c referenced,uptodate,lru,active,workingset
fffffffee2c27140 9309c5000 ffffff8845d11d60        6  6 400000000000012c referenced,uptodate,lru,active
fffffffee054cb80 89532e000 ffffff8845d11d60        7  6 400000000000032c referenced,uptodate,lru,active,workingset
fffffffee054cdc0 895337000 ffffff8845d11d60        8  6 400000000000032c referenced,uptodate,lru,active,workingset
fffffffee4944d40 9a5135000 ffffff8845d11d60        9  9 400000000000032c referenced,uptodate,lru,active,workingset
fffffffee84f9640 a93e59000 ffffff8845d11d60        a  6 400000000000032c referenced,uptodate,lru,active,workingset
fffffffee944a4c0 ad1293000 ffffff8845d11d60        b  6 400000000000032c referenced,uptodate,lru,active,workingset
fffffffee8e22a00 ab88a8000 ffffff8845d11d60        c  6 400000000000032c referenced,uptodate,lru,active,workingset
fffffffec14d8480  d3612000 ffffff8845d11d60        d  6 400000000000032c referenced,uptodate,lru,active,workingset
fffffffee6f61100 a3d844000 ffffff8845d11d60        e  6 400000000000032c referenced,uptodate,lru,active,workingset
fffffffee74ef7c0 a53bdf000 ffffff8845d11d60        f 15 400000000000032c referenced,uptodate,lru,active,workingset
fffffffee7b608c0 a6d823000 ffffff8845d11d60       10  1 400000000000032c referenced,uptodate,lru,active,workingset
fffffffec1553b80  d54ee000 ffffff8845d11d60       11  1 400000000000032c referenced,uptodate,lru,active,workingset
fffffffec1c32d40  f0cb5000 ffffff8845d11d60       12  1 400000000000032c referenced,uptodate,lru,active,workingset
fffffffec19e5880  e7962000 ffffff8845d11d60       13  1 400000000000032c referenced,uptodate,lru,active,workingset
fffffffec1ac2c80  eb0b2000 ffffff8845d11d60       14  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee6f95780 a3e55e000 ffffff8845d11d60       15  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee8c5e500 ab1794000 ffffff8845d11d60       16  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee831c640 a8c719000 ffffff8845d11d60       17  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffec1ad0240  eb409000 ffffff8845d11d60       18  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffec0af8680  abe1a000 ffffff8845d11d60       19  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee7dc1b00 a7706c000 ffffff8845d11d60       1a  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee808d000 a82340000 ffffff8845d11d60       1b  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee90c2e00 ac30b8000 ffffff8845d11d60       1c  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee4987ec0 9a61fb000 ffffff8845d11d60       1d  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee8972740 aa5c9d000 ffffff8845d11d60       1e  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee6bb7900 a2ede4000 ffffff8845d11d60       1f  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee7916f80 a645be000 ffffff8845d11d60       20  1 400000000000212c referenced,uptodate,lru,active,arch_1
fffffffee9435b40 ad0d6d000 ffffff8845d11d60       21  1 400000000000212c referenced,uptodate,lru,active,arch_1
fffffffee7e03680 a780da000 ffffff8845d11d60       22  1 400000000000212c referenced,uptodate,lru,active,arch_1
fffffffee7f7ce80 a7df3a000 ffffff8845d11d60       23  1 400000000000212c referenced,uptodate,lru,active,arch_1
fffffffee60f7580 a03dd6000 ffffff8845d11d60       24  1 400000000000212c referenced,uptodate,lru,active,arch_1
fffffffee7e22b40 a788ad000 ffffff8845d11d60       25  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee634d780 a0d35e000 ffffff8845d11d60       26  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee2389080 90e242000 ffffff8845d11d60       27  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee7142e80 a450ba000 ffffff8845d11d60       28  1 400000000000212c referenced,uptodate,lru,active,arch_1
fffffffee41dd600 987758000 ffffff8845d11d60       29  1 400000000000212c referenced,uptodate,lru,active,arch_1
fffffffee6e0a1c0 a38287000 ffffff8845d11d60       2a  3 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee4fc3d80 9bf0f6000 ffffff8845d11d60       2b  3 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee7f03600 a7c0d8000 ffffff8845d11d60       2c  1 400000000000212c referenced,uptodate,lru,active,arch_1
fffffffee6e0fec0 a383fb000 ffffff8845d11d60       2d  2 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee759f3c0 a567cf000 ffffff8845d11d60       2e  2 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee40dbac0 9836eb000 ffffff8845d11d60       2f  1 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee6a9ed00 a2a7b4000 ffffff8845d11d60       30  2 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee55be9c0 9d6fa7000 ffffff8845d11d60       31  2 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee7e50440 a79411000 ffffff8845d11d60       32  2 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee6e8f140 a3a3c5000 ffffff8845d11d60       33  6 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffec00c99c0  83267000 ffffff8845d11d60       34  2 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffec00c9980  83266000 ffffff8845d11d60       35  2 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee664c200 a19308000 ffffff8845d11d60       36  2 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee8a8a440 aaa291000 ffffff8845d11d60       37  2 400000000000232c referenced,uptodate,lru,active,workingset,arch_1
fffffffee87c0440 a9f011000 ffffff8845d11d60       38  1 400000000000012c referenced,uptodate,lru,active
fffffffee6ac5380 a2b14e000 ffffff8845d11d60       39  1 400000000000012c referenced,uptodate,lru,active
fffffffee71a0140 a46805000 ffffff8845d11d60       3a  1 400000000000012c referenced,uptodate,lru,active
fffffffee63df800 a0f7e0000 ffffff8845d11d60       3b  1 400000000000012c referenced,uptodate,lru,active
fffffffee891a800 aa46a0000 ffffff8845d11d60       3c  1 400000000000012c referenced,uptodate,lru,active
yaml 复制代码
crash-android> rd -p a03dd6000 -e a03dd6100
       a03dd6000:  0000000000000000 0000000000000000   ................
       a03dd6010:  0000000000000000 0000000000000000   ................
       a03dd6020:  0000000000000000 0000000000000000   ................
       a03dd6030:  0000000000000000 0000000000000000   ................
       a03dd6040:  0000000000000000 0000000000000000   ................
       a03dd6050:  0000000000000000 0000000000000000   ................
       a03dd6060:  0000000000000000 0000000000000000   ................
       a03dd6070:  0000000000000000 0000000000000000   ................
       a03dd6080:  0000000000000000 0000000000000000   ................
       a03dd6090:  0000000000000000 0000000000000000   ................
       a03dd60a0:  0000000000000000 0000000000000000   ................
       a03dd60b0:  0000000000000000 0000000000000000   ................
       a03dd60c0:  0000000000000000 0000000000000000   ................
       a03dd60d0:  0000000000000000 0000000000000000   ................
       a03dd60e0:  0000000000000000 0000000000000000   ................
       a03dd60f0:  0000000000000000 0000000000000000   ................
crash-android>
yaml 复制代码
crash-android> rd -p a7df3a000 -e a7df3a100
       a7df3a000:  0000000000000000 0000000000000000   ................
       a7df3a010:  0000000000000000 0000000000000000   ................
       a7df3a020:  0000000000000000 0000000000000000   ................
       a7df3a030:  0000000000000000 0000000000000000   ................
       a7df3a040:  0000000000000000 0000000000000000   ................
       a7df3a050:  0000000000000000 0000000000000000   ................
       a7df3a060:  0000000000000000 0000000000000000   ................
       a7df3a070:  0000000000000000 0000000000000000   ................
       a7df3a080:  0000000000000000 0000000000000000   ................
       a7df3a090:  0000000000000000 0000000000000000   ................
       a7df3a0a0:  0000000000000000 0000000000000000   ................
       a7df3a0b0:  0000000000000000 0000000000000000   ................
       a7df3a0c0:  0000000000000000 0000000000000000   ................
       a7df3a0d0:  0000000000000000 0000000000000000   ................
       a7df3a0e0:  0000000000000000 0000000000000000   ................
       a7df3a0f0:  0000000000000000 0000000000000000   ................
crash-android> 

可见文件 inode 对应的 address_space 记录的页缓存中的 0x23000 与 0x24000 是个空页。经过核实从 OFFSET 在 [0x1D000, 0x27000) 的区间的 10 个页内容都是 0,它前后页表都是正确的,不是整个代码段的页表异常,仅是部分。

当前结论:页缓存内容被清零,因此该机器这段时间内加载 libm.so 并运行该部分代码即会出现指令异常错误。

内存痕迹

相信前面的内容,绝大部分工程师都能分析到,但也会止步于此无法在往下。连续 10 页清零的特征,在内核中能够办到的函数可以想到 zero_fill_bio 函数。

该函数的参数内存特征有数据结构 bio,而 bio 又持有 bi_io_vec 指针地址,而 bio_vec 就是我们要找的那 10 个页表地址,因此,我们检索的内存特征是那 10 页 page 的地址,于是在内存中可以找到痕迹。

vbnet 复制代码
crash-android> rd ffffff8026f16000 -e ffffff8026f16100
ffffff8026f16000:  fffffffee808d000 0000000000001000   ................  0x1b
ffffff8026f16010:  fffffffee90c2e00 0000000000001000   ................  0x1c
ffffff8026f16020:  fffffffee4987ec0 0000000000001000   .~..............  0x1d
ffffff8026f16030:  fffffffee8972740 0000000000001000   @'..............  0x1e
ffffff8026f16040:  fffffffee6bb7900 0000000000001000   .y..............  0x1f
ffffff8026f16050:  fffffffee7916f80 0000000000001000   .o..............  0x20
ffffff8026f16060:  fffffffee9435b40 0000000000001000   @[C.............  0x21
ffffff8026f16070:  fffffffee7e03680 0000000000001000   .6..............  0x22
ffffff8026f16080:  fffffffee7f7ce80 0000000000001000   ................  0x23
ffffff8026f16090:  fffffffee60f7580 0000000000001000   .u..............  0x24
ffffff8026f160a0:  fffffffee7e22b40 0000000000001000   @+..............  0x25
ffffff8026f160b0:  fffffffee634d780 0000000000001000   ..4.............  0x26
ffffff8026f160c0:  0000000000000000 0000000000000000   ................
ffffff8026f160d0:  0000000000000000 0000000000000000   ................
ffffff8026f160e0:  0000000000000000 0000000000000000   ................
ffffff8026f160f0:  0000000000000000 0000000000000000   ................

检索内存找到相关性痕迹,正好符合连续 10 页的操作 bio 痕迹,正是此问题的相关页。继续检索 bio 地址,那特征就是持有 vec 指针地址的内存附近,于是搜索地址 ffffff8026f16000 得到以下痕迹。

vbnet 复制代码
crash-android> rd ffffff8026f16f78 -e ffffff8026f17178
ffffff8026f16f78:  0000000000000000 0000000000000000   ................
ffffff8026f16f88:  0000000000000000 0000000000000000   ................
ffffff8026f16f98:  0000000000000000 0000000000000000   ................
ffffff8026f16fa8:  0000000000000000 0000000000000000   ................
ffffff8026f16fb8:  0000000000000000 0000000000000000   ................
ffffff8026f16fc8:  0000000000000000 0000000000000000   ................
ffffff8026f16fd8:  0000000000000000 0000000000000000   ................
ffffff8026f16fe8:  0000000000000000 0000000000000000   ................
ffffff8026f16ff8:  0000000000000000 321a84c0c34b3bb3   .........;K....2
ffffff8026f17008:  0000000000000000 0000000000000000   ................
ffffff8026f17018:  0000000100000000 0000000000004c40   ........@L......
ffffff8026f17028:  000000020000a000 ffffffff00000000   ................
ffffff8026f17038:  0000000000000000 0000000000000000   ................
ffffff8026f17048:  0000000000000000 0000000000000000   ................
ffffff8026f17058:  0000000000000000 0000000000000000   ................
ffffff8026f17068:  00000100000c0000 0000000000000001   ................
ffffff8026f17078:  ffffff8026f16000 0000000000000000   .`.&............
ffffff8026f17088:  0000000000000000 0000000000000000   ................
ffffff8026f17098:  0000000000000000 ffffff8827f16500   .........e.'....
ffffff8026f170a8:  0000000000988000 ffffffebecca92a4   ................
ffffff8026f170b8:  0000000000000000 0000400400000000   .............@..
ffffff8026f170c8:  0000000000000000 ffffff801e149000   ................
ffffff8026f170d8:  0000000000000000 0000000000000000   ................
ffffff8026f170e8:  0000000000000000 0000000000000000   ................
ffffff8026f170f8:  0000000000000000 0000000000000000   ................
ffffff8026f17108:  0000000000000000 0000000000000000   ................
ffffff8026f17118:  0000000000000000 0000000000000000   ................
ffffff8026f17128:  0000000000000000 0000000000000000   ................
ffffff8026f17138:  0000000000000000 0000000000000000   ................
ffffff8026f17148:  0000000000000000 0000000000000000   ................
ffffff8026f17158:  0000000000000000 0000000000000000   ................
ffffff8026f17168:  0000000000000000 0000000000000000   ................
ini 复制代码
crash-android> struct bio ffffff8026f17000 -x
struct bio {
  bi_next = 0x321a84c0c34b3bb3,
  bi_bdev = 0x0,
  bi_opf = 0x0,
  bi_flags = 0x0,
  bi_ioprio = 0x0,
  bi_write_hint = WRITE_LIFE_NOT_SET,
  bi_status = 0x0,
  __bi_remaining = {
    counter = 0x1
  },
  bi_iter = {
    bi_sector = 0x4c40,
    bi_size = 0xa000,
    bi_idx = 0x2,
    bi_bvec_done = 0x0
  },
  {
    bi_cookie = 0xffffffff,
    __bi_nr_segments = 0xffffffff
  },
  bi_end_io = 0x0,
  bi_private = 0x0,
  bi_blkg = 0x0,
  bi_issue = {
    value = 0x0
  },
  bi_iocost_cost = 0x0,
  bi_crypt_context = 0x0,
  bi_skip_dm_default_key = 0x0,
  bi_vcnt = 0xc,
  bi_max_vecs = 0x100,
  __bi_cnt = {
    counter = 0x1
  },
  bi_io_vec = 0xffffff8026f16000,
  bi_pool = 0x0,
  android_oem_data1 = 0x0,
  __kabi_reserved1 = 0x0,
  __kabi_reserved2 = 0x0,
  bi_inline_vecs = 0xffffff8026f170a0
}

由于该文件是在 erofs 文件系统中的,因此该数据痕迹应该是 erofs_fileio_rq 遗留的。

ini 复制代码
crash-android> struct kiocb ffffff8026f170a0
struct kiocb {
  ki_filp = 0xffffff8827f16500,
  ki_pos = 9994240,
  ki_complete = 0xffffffebecca92a4 <erofs_fileio_ki_complete>,
  private = 0x0,
  ki_flags = 0,
  ki_ioprio = 16388,
  {
    ki_waitq = 0x0,
    dio_complete = 0x0
  }
}
bash 复制代码
crash-android> lp file 0xffffff8827f16500
/system/apex/com.android.runtime.apex
ini 复制代码
crash-android> struct erofs_fileio_rq ffffff8026f16000 -x
struct erofs_fileio_rq {
  bvecs = {{
      bv_page = 0xfffffffee808d000,
      bv_len = 0x1000,
      bv_offset = 0x0
    }, {
      bv_page = 0xfffffffee90c2e00,
      bv_len = 0x1000,
      bv_offset = 0x0
    }, {
      bv_page = 0xfffffffee4987ec0,
      bv_len = 0x1000,
      bv_offset = 0x0
    }, {
      bv_page = 0xfffffffee8972740,
      bv_len = 0x1000,
      bv_offset = 0x0
    }, {
      bv_page = 0xfffffffee6bb7900,
      bv_len = 0x1000,
      bv_offset = 0x0
    }, {
      bv_page = 0xfffffffee7916f80,
      bv_len = 0x1000,
      bv_offset = 0x0
    }, {
      bv_page = 0xfffffffee9435b40,
      bv_len = 0x1000,
      bv_offset = 0x0
    }, {
      bv_page = 0xfffffffee7e03680,
      bv_len = 0x1000,
      bv_offset = 0x0
    }, {
      bv_page = 0xfffffffee7f7ce80,
      bv_len = 0x1000,
      bv_offset = 0x0
    }, {
      bv_page = 0xfffffffee60f7580,
      bv_len = 0x1000,
      bv_offset = 0x0
    }, {
      bv_page = 0xfffffffee7e22b40,
      bv_len = 0x1000,
      bv_offset = 0x0
    }, {
      bv_page = 0xfffffffee634d780,
      bv_len = 0x1000,
      bv_offset = 0x0
    }, {
...
  bio = {
    bi_next = 0x321a84c0c34b3bb3,
    bi_bdev = 0x0,
    bi_opf = 0x0,
    bi_flags = 0x0,
    bi_ioprio = 0x0,
    bi_write_hint = WRITE_LIFE_NOT_SET,
    bi_status = 0x0,
    __bi_remaining = {
      counter = 0x1
    },
    bi_iter = {
      bi_sector = 0x4c40,
      bi_size = 0xa000,
      bi_idx = 0x2,
      bi_bvec_done = 0x0
    },
    {
      bi_cookie = 0xffffffff,
      __bi_nr_segments = 0xffffffff
    },
    bi_end_io = 0x0,
    bi_private = 0x0,
    bi_blkg = 0x0,
    bi_issue = {
      value = 0x0
    },
    bi_iocost_cost = 0x0,
    bi_crypt_context = 0x0,
    bi_skip_dm_default_key = 0x0,
    bi_vcnt = 0xc,
    bi_max_vecs = 0x100,
    __bi_cnt = {
      counter = 0x1
    },
    bi_io_vec = 0xffffff8026f16000,
    bi_pool = 0x0,
    android_oem_data1 = 0x0,
    __kabi_reserved1 = 0x0,
    __kabi_reserved2 = 0x0,
    bi_inline_vecs = 0xffffff8026f170a0
  },
    iocb = {
    ki_filp = 0xffffff8827f16500,
    ki_pos = 0x988000,
    ki_complete = 0xffffffebecca92a4 <erofs_fileio_ki_complete>,
    private = 0x0,
    ki_flags = 0x0,
    ki_ioprio = 0x4004,
    {
      ki_waitq = 0x0,
      dio_complete = 0x0
    }
  },
  sb = 0xffffff801e149000
}

小结

  1. 在某一次程序访问 libm.so 文件,此时 inode address_space 缓存中未能命中,从 erofs 文件系统中重新读取页面数据(erofs_fileio_read_folio 或 erofs_fileio_readahead),此处可能存在异常导致连续的 10 页 (0x1d~0x26)清零未装入原始数据,并完成了本次 IO 处理,页表属性更新为 uptodate,并添加到 inode address_space 的管理。
  2. composer-service 首次发生 ILL_ILLOPC,接下来由于 inode address_space 缓存中的页表存在,这个期间任一程序调用 libm.so 这部分函数时,发生缺页中断,进入文件页缺页函数(filemap_fault),命中 inode address_space 缓存拿到已经污染的页表(uptodate 软件层面上认为是干净有效的页表),因此程序不断的 ILL_ILLOPC 错误。

文件系统分析

梳理下 read_folio 的到 erofs_fileio_ki_complete 的代码流程,程序如何走到 zero_fill_bio 中。

可以看到 vfs_iocb_iter_read 进入 filemap_get_pages 函数中会被信号打断,返回当前读取的数据大小,没有任何错误码,因此进入函数 erofs_fileio_ki_complete 数据大小和预期大小不相符,调用了bio_advancezero_fill_bio 函数,将多余的页表内容清零。

folio_end_read 函数中将页表状态置为 uptodate,因此污染了 page cache 的页表。

模拟测试

c 复制代码
std::string md5file(const std::string& path) {
    std::ifstream f(path, std::ios::binary);
    if (!f) return "";

    MD5 md5;
    char buf[4096];
    while (f.read(buf, sizeof(buf)) || f.gcount())
        md5.update(reinterpret_cast<const uint8_t*>(buf), f.gcount());

    return md5.digest();
}

int erofs_bug_zero_fill_bio(char *filename) {
    struct stat sb;
    if (stat(filename, &sb) == -1)
        return 0;
    drop_caches();
    std::thread([&]() {
        // std::this_thread::sleep_for(std::chrono::milliseconds(1));
        std::this_thread::sleep_for(std::chrono::microseconds(
            sb.st_size >= (256 * 1024) ? 1000 : (int)(0.0038146F * sb.st_size)));
        syscall(SYS_kill, getpid(), 9);
    }).detach();
    std::string md5 = md5file(filename);
    std::cout << md5 << std::endl;
    return 0;
}

int main(int argc, char* argv[]) {
    if (argc < 2) return 1;
    drop_caches();
    std::string md5 = md5file(argv[1]);

    while (1) {
        pid_t pid = fork();
        int status;
        if (pid == 0) {
            erofs_bug_zero_fill_bio(argv[1]);
            exit(0);
        } else if (pid > 0) {
            waitpid(pid, &status, 0);
            std::string current_md5 = md5file(argv[1]);
            if (md5 != current_md5) {
                std::cout << "erofs bug zero_fill_bio!!\n"
                          << argv[1] << " md5sum miss match!!\n"
                          << md5 << " != " << current_md5 << std::endl;
                break;
            }
        }
    }
    return 0;
}
bash 复制代码
# ./data/erofs-detect /apex/com.android.runtime/lib64/bionic/libm.so                                                                                                                          
erofs bug zero_fill_bio!!
/apex/com.android.runtime/lib64/bionic/libm.so md5sum miss match!!
65f089be0c9b8cb2d4d7b9bfff44c50e != f53a942a9508d1a5ba5d3ba703ba71df
shell 复制代码
# md5sum /apex/com.android.runtime/lib64/bionic/libm.so
f53a942a9508d1a5ba5d3ba703ba71df  /apex/com.android.runtime/lib64/bionic/libm.so

# echo 3 > /proc/sys/vm/drop_caches
# md5sum /apex/com.android.runtime/lib64/bionic/libm.so
65f089be0c9b8cb2d4d7b9bfff44c50e  /apex/com.android.runtime/lib64/bionic/libm.so
相关推荐
AMoon丶2 小时前
C++基础-类、对象
java·linux·服务器·c语言·开发语言·jvm·c++
huangchen2 小时前
Compose 中 viewModel() 函数分析
android
指尖在键盘上舞动2 小时前
Cannot find matching video player interface for ‘ffpyplayer‘.解决方案
linux·ubuntu·ffmpeg·psychopy·ffpyplayer
Tobinary2 小时前
Android系统启动
android
桌面运维家2 小时前
Linux/Windows终端密码设置:保护你的vDisk数据
linux·运维·服务器
ErizJ2 小时前
面试 | 操作系统
linux·面试·职场和发展·操作系统·os
微露清风2 小时前
系统性学习Linux-第五讲-基础IO
linux·运维·学习
柏木乃一2 小时前
Linux线程(8)基于单例模式的线程池
linux·运维·服务器·c++·单例模式·操作系统·线程
我又来搬代码了2 小时前
【Android】基于GDAL库实现SHP文件读写
android