问题背景
最近发生了一系列 App 闪退问题,国内的某行、某123等软件频频打开即闪退,开始并未引起重视,直到测试同学发现大量 App 无法打开,并且闪退原因没有直观的堆栈信息。
yaml
Timestamp: 2024-11-08 15:13:55.430873312+0800
Process uptime: 504s
Cmdline: com.xx.xxx
pid: 16580, tid: 16580, name: xx.xxx >>> com.xx.xxx <<<
uid: 10297
tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x000000e4174ed9c0
x0 000000720baa3000 x1 00000071f13990a2 x2 0000007349c21b00 x3 0000007fc2877668
x4 0000000000000400 x5 00000000a802a802 x6 000000000000001a x7 6434372d30303061
x8 000000000000a9b0 x9 000000e4174ed9c0 x10 0000000000000000 x11 000000e4174ed9d0
x12 0000000000000018 x13 000000000000000f x14 0000000000000001 x15 000000000000001b
x16 00000074d4381ef0 x17 00000074d4314fe0 x18 00000074d8ba4000 x19 00000071f13990a2
x20 0000000000000008 x21 00000000ffffffff x22 0000000000000000 x23 00000071f13c7350
x24 0000000000000078 x25 00000071f13c7000 x26 00000071f13c7000 x27 00000071f13c1000
x28 00000071f13c7420 x29 0000007fc2877b50
lr 00000071f1314b80 sp 0000007fc2877b10 pc 00000071f130b4a4 pst 0000000060001000
1 total frames
backtrace:
#00 pc 00000000000394a4 /data/app/...==/com.xx.xxx-..==/lib/arm64/libDexHelper.so
Core 抓取
既然该问题是必现,那么我们就有很多办法可以进行分析调试,首先我们可以抓取一份原生的 Core 来进行分析。之前的文章也介绍过如何通过原生配置 core filter 抓取应用程序的 Core 文件。
kotlin
adb wait-for-device
adb root
adb shell mkdir /data/data/com.xx.xxx/cores
adb shell chmod 777 /data/data/com.xx.xxx/cores
adb shell restorecon -R /data/data/com.xx.xxx/cores
adb shell 'echo /data/data/com.xx.xxx/cores/core.%e.%p > /proc/sys/kernel/core_pattern'
adb shell 'system/bin/ulimit -P `pidof com.xx.xxx` -c unlimited'
adb shell 'echo 0x27 > /proc/`pidof com.xx.xxx`/coredump_filter'
由于该问题闪退速度较快,因此我们要在程序闪退前设置 core filter 参数,那么就要将进程冻结,我们可以在应用程序启动后立即发送 19 信号给应用程序。
shell
# kill -19 `pidof com.xx.xxx`
暂停程序后,运行 core filter 脚本后将应用程序恢复。
shell
# kill -18 `pidof com.xx.xxx`
成功抓取到原生 Core 文件会在 /data/data/com.xx.xxx/cores/ 目录下生成。
Core 分析
由于该错误线程与 Java 相关,我们要分析其场景,建议直接用 core-parser 进行解析堆栈,像这种三方程序我们没有其符号文件是很难在 GDB、LLDB 上成功输出完整堆栈。
yaml
core-parser> bt
"main" sysTid=16580 Native
| group="main" daemon=0 prio=5 target=0x0
| tid=1 sCount=0 flags=0 obj=0x73254b08 self=0xb400007359b2ebe0
| stack=0x7fc207f000-0x7fc2081000 stackSize=0x7ff000 handle=0x74d9a25d28
| mutexes=0xb400007359b2f378 held=
x0 0x000000720baa3000 x1 0x00000071f13990a2 x2 0x0000007349c21b00 x3 0x0000007fc2877668
x4 0x0000000000000400 x5 0x00000000a802a802 x6 0x000000000000001a x7 0x6434372d30303061
x8 0x000000000000a9b0 x9 0x000000e4174ed9c0 x10 0x0000000000000000 x11 0x000000e4174ed9d0
x12 0x0000000000000018 x13 0x000000000000000f x14 0x0000000000000001 x15 0x000000000000001b
x16 0x00000074d4381ef0 x17 0x00000074d4314fe0 x18 0x00000074d8ba4000 x19 0x00000071f13990a2
x20 0x0000000000000008 x21 0x00000000ffffffff x22 0x0000000000000000 x23 0x00000071f13c7350
x24 0x0000000000000078 x25 0x00000071f13c7000 x26 0x00000071f13c7000 x27 0x00000071f13c1000
x28 0x00000071f13c7420 fp 0x0000007fc2877b50
lr 0x00000071f1314b80 sp 0x0000007fc2877b10 pc 0x00000071f130b4a4 pst 0x0000000060001000
Native: #0 00000071f130b4a4
Native: #1 00000071f1314b7c
Native: #2 00000071f1314b7c
Native: #3 00000071f1304544
Native: #4 00000071f1303448 JNI_OnLoad+0x1a48
Native: #5 000000721850e1fc art::JavaVMExt::LoadNativeLibrary(_JNIEnv*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, _jobject*, _jclass*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*)+0x58c
Native: #6 000000720b1222b4 JVM_NativeLoad+0x164
Native: #7 00000000718447c8
JavaKt: #00 0000000000000000 java.lang.Runtime.nativeLoad
JavaKt: #01 000000721790a00a java.lang.Runtime.loadLibrary0
JavaKt: #02 0000007217909f90 java.lang.Runtime.loadLibrary0
JavaKt: #03 0000007217915400 java.lang.System.loadLibrary
JavaKt: #04 000000719f46557c com.secneo.apkwrapper.H.load
JavaKt: #05 000000719f46429a com.secneo.apkwrapper.AP.instantiateClassLoader
JavaKt: #06 000000721713cc8e android.app.LoadedApk.createOrUpdateClassLoaderLocked
JavaKt: #07 000000721713bb12 android.app.LoadedApk.getResources
JavaKt: #08 00000072170d3548 android.app.ContextImpl.createAppContext
JavaKt: #09 00000072170a408e android.app.ActivityThread.handleBindApplication
JavaKt: #10 000000721709bb96 android.app.ActivityThread$H.handleMessage
JavaKt: #11 0000007215f0847e android.os.Handler.dispatchMessage
JavaKt: #12 0000007215f2fb14 android.os.Looper.loopOnce
JavaKt: #13 0000007215f302b0 android.os.Looper.loop
JavaKt: #14 00000072170a89e8 android.app.ActivityThread.main
JavaKt: #15 0000000000000000 java.lang.reflect.Method.invoke
JavaKt: #16 0000007214f54bde com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run
JavaKt: #17 0000007214f59a0e com.android.internal.os.ZygoteInit.main
yaml
core-parser> f 4
JavaKt: #04 000000719f46557c com.secneo.apkwrapper.H.load()
{
Location: /data/app/...==/com.xx.xxx-...==/oat/arm64/base.vdex
art::ArtMethod: 0x74c3a82410
dex_pc_ptr: 0x719f46557c
quick_frame: 0x7fc2879ec0
frame_pc: 0x7218718188
method_header: 0x721870ee3c
DEX CODE:
0x719f465572: 010a | move-result v1
0x719f465574: 0138 0008 | if-eqz v1, 0x719f465584 //+8
0x719f465578: 0162 3c3b | sget-object v1, Lcom/secneo/apkwrapper/H;.X86_LIBRARY:Ljava/lang/String; // field@15419
0x719f46557c: 1071 83d5 0001 | invoke-static {v1}, void java.lang.System.loadLibrary(java.lang.String) // method@33749
{
v0 = 0x700c4b58 v1 = 0x775a4000 v2 = 0x3d3d4138 v3 = 0x6d6f632f
v4 = 0x646c726f v5 = 0x636f622e
}
OAT CODE:
0x7218718164: 934cfd10 | asr x16, x8, #12
0x7218718168: b8705ba4 | ldr w4, [x29, w16, uxtw #2]
0x721871816c: d3482d10 | ubfx x16, x8, #8, #4
0x7218718170: b8705ba3 | ldr w3, [x29, w16, uxtw #2]
0x7218718174: d3441d10 | ubfx x16, x8, #4, #4
0x7218718178: b8705ba2 | ldr w2, [x29, w16, uxtw #2]
0x721871817c: 92400d10 | and x16, x8, #0xf
0x7218718180: b8705ba1 | ldr w1, [x29, w16, uxtw #2]
0x7218718184: f9400c1e | ldr x30, [x0, #0x18]
0x7218718188: d63f03c0 | blr x30
0x721871818c: 78406ed7 | ldrh w23, [x22, #6]!
0x7218718190: 92401ef0 | and x16, x23, #0xff
{
x22 = 0x000000719f46557c x23 = 0x0000000000001071 x24 = 0x000000721870f600 lr = 0x000000721871818c
}
}
core-parser> p 0x775a4000
Size: 0x20
Object Name: java.lang.String
[0x10] virutal char[] values = "DexHelper"
[0x0c] private int hash = -1433472379
[0x08] private final int count = 9
// extends java.lang.Object
[0x04] private transient int shadow$_monitor_ = 0
[0x00] private transient java.lang.Class shadow$_klass_ = 0x7009f1a0
从 Java 堆栈上看可以了解到问题发生在梆梆加固软件的代码上。
bash
core-parser> file 00000071f130b4a4
[71f12d2000, 71f13ba000) 0000000000000000 /data/app/...==/com.xx.xxx-..==/lib/arm64/libDexHelper.so
直接原因
yaml
core-parser> f 1 -n
Native: #1 00000071f1314b7c
{
library: /data/app/...==/com.xx.xxx-..==/lib/arm64/libDexHelper.so
symbol:
frame_fp: 0x7fc2877b50
frame_pc: 0x71f1314b7c
ASM CODE:
0x71f1314b5c: 7100791f | cmp w8, #0x1e
0x71f1314b60: 5400106b | b.lt 0x71f1314d6c
0x71f1314b64: b0000420 | adrp x0, 0x71f1399000
0x71f1314b68: 91032800 | add x0, x0, #0xca
0x71f1314b6c: 2a1f03e1 | mov w1, wzr
0x71f1314b70: 940011ec | bl 0x71f1319320
0x71f1314b74: b0000421 | adrp x1, 0x71f1399000
0x71f1314b78: 91028821 | add x1, x1, #0xa2
0x71f1314b7c: 97ffda3c | bl 0x71f130b46c
}
core-parser> f 0 -n
Native: #0 00000071f130b4a4
{
library: /data/app/...==/com.xx.xxx-..==/lib/arm64/libDexHelper.so
symbol:
frame_fp: 0x7fc2877b50
frame_pc: 0x71f130b4a4
ASM CODE:
0x71f130b484: b4000b20 | cbz x0, 0x71f130b5e8
0x71f130b488: 79407008 | ldrh w8, [x0, #0x38]
0x71f130b48c: aa0103f3 | mov x19, x1
0x71f130b490: b40001e8 | cbz x8, 0x71f130b4cc
0x71f130b494: f9401009 | ldr x9, [x0, #0x20]
0x71f130b498: aa1f03ea | mov x10, xzr
0x71f130b49c: 8b000129 | add x9, x9, x0
0x71f130b4a0: 9100412b | add x11, x9, #0x10
0x71f130b4a4: b85f016c | ldur w12, [x11, #-0x10]
}
由于 x11 = 000000e4174ed9d0 是个非法地址,因此在 0x71f130b4a4 访问了无效地址导致段错误。
逆向解析
yaml
core-parser> sysroot libDexHelper.so
Mmap segment [71f12d2000, 71f13ba000) libDexHelper.so [0]
Mmap segment [71f13de000, 71f13df000) libDexHelper.so [f2000]
core-parser> f 0 -n
Native: #0 00000071f130b4a4
{
library: libDexHelper.so
symbol:
frame_fp: 0x7fc2877b50
frame_pc: 0x71f130b4a4
ASM CODE:
}
core-parser> rd 0x71f130b4a4 --origin
71f130b4a4: 7100059fb85f016c l._....q
core-parser> rd 0x71f130b4a4 --mmap
71f130b4a4: 82900f8321090149 I..!....
libDexHelper.so 该库文件无法直接从机器中取下解析,原因是它运行时才释放真实代码,于是我们需要从内存中将各个段重新组合成一个新的 libDexHelper_fake.so。
bash
core-parser> file | grep DexH
[71f12d2000, 71f13ba000) 0000000000000000 /data/app/...==/com.xx.xxx-..==/lib/arm64/libDexHelper.so
[71f13bb000, 71f13c5000) 00000000000e8000 /data/app/...==/com.xx.xxx-..==/lib/arm64/libDexHelper.so
[71f13de000, 71f13df000) 00000000000f2000 /data/app/...==/com.xx.xxx-..==/lib/arm64/libDexHelper.so
python
core-parser> rd 71f12d2000 -e 71f13ba000 -f 71f12d2000.bin
core-parser> rd 71f13bb000 -e 71f13c5000 -f 71f13bb000.bin
core-parser> rd 71f13de000 -e 71f13df000 -f 71f13de000.bin
我们先分析 Native#0、Native#1 的代码具体在干什么,可以使用 IDAPRO、GHIDRA 等软件来辅助逆向分析,以下只截取关键片段:
此处对应的是 Native#1 的伪代码,显然是想通过 FUN_00147320 来获得 libperfetto_hprof.so 的加载首地址。然后计算符号 _ZN14perfetto_hprof17g_signal_pipe_fdsE 的偏移地址。
arduino
(gdb) ptype /o Elf64_Ehdr
type = struct elf64_hdr {
/* 0 | 16 */ unsigned char e_ident[16];
/* 16 | 2 */ Elf64_Half e_type;
/* 18 | 2 */ Elf64_Half e_machine;
/* 20 | 4 */ Elf64_Word e_version;
/* 24 | 8 */ Elf64_Addr e_entry;
/* 32 | 8 */ Elf64_Off e_phoff;
/* 40 | 8 */ Elf64_Off e_shoff;
/* 48 | 4 */ Elf64_Word e_flags;
/* 52 | 2 */ Elf64_Half e_ehsize;
/* 54 | 2 */ Elf64_Half e_phentsize;
/* 56 | 2 */ Elf64_Half e_phnum;
/* 58 | 2 */ Elf64_Half e_shentsize;
/* 60 | 2 */ Elf64_Half e_shnum;
/* 62 | 2 */ Elf64_Half e_shstrndx;
/* total size (bytes): 64 */
}
在计算符号时遍历 ELF Program Headers 过程中,访问对应 Program 段内容时异常。看似平凡不过的代码,为什么会出现问题?
yaml
core-parser> rd 0x71f130b46c -e 0x71f130b4a4 -i
0x71f130b46c: f81b0ff9 | str x25, [sp, #-0x50]!
0x71f130b470: a9015ff8 | stp x24, x23, [sp, #0x10]
0x71f130b474: a90257f6 | stp x22, x21, [sp, #0x20]
0x71f130b478: a9034ff4 | stp x20, x19, [sp, #0x30]
0x71f130b47c: a9047bfd | stp x29, x30, [sp, #0x40]
0x71f130b480: 910103fd | add x29, sp, #0x40
0x71f130b484: b4000b20 | cbz x0, 0x71f130b5e8
0x71f130b488: 79407008 | ldrh w8, [x0, #0x38]
0x71f130b48c: aa0103f3 | mov x19, x1
0x71f130b490: b40001e8 | cbz x8, 0x71f130b4cc
0x71f130b494: f9401009 | ldr x9, [x0, #0x20]
0x71f130b498: aa1f03ea | mov x10, xzr
0x71f130b49c: 8b000129 | add x9, x9, x0
0x71f130b4a0: 9100412b | add x11, x9, #0x10
0x71f130b4a4: b85f016c | ldur w12, [x11, #-0x10]
0x71f130b4a8: 7100059f | cmp w12, #1
我们知道 x0 是 libperfetto_hprof.so 加载地址,x1 是字符串 _ZN14perfetto_hprof17g_signal_pipe_fdsE 地址。
x0 0x000000720baa3000 x1 0x00000071f13990a2
特异点
perl
core-parser> map | grep libperfetto_hprof.so
297 0x74d877e260 [720ba27000, 720ba47000) r-- /apex/com.android.art/lib64/libperfetto_hprof.so [*]
此时的 x0 = 0x000000720baa3000,与真实的 libperfetto_hprof.so 加载地址并不相符合。
bash
core-parser> file | grep libperfetto_hprof.so
[720ba27000, 720ba47000) 0000000000000000 /apex/com.android.art/lib64/libperfetto_hprof.so
[720ba47000, 720baa3000) 0000000000020000 /apex/com.android.art/lib64/libperfetto_hprof.so
[720baa3000, 720baa7000) 000000000007c000 /apex/com.android.art/lib64/libperfetto_hprof.so
[720baa7000, 720baa8000) 0000000000080000 /apex/com.android.art/lib64/libperfetto_hprof.so
当前应用程序计算的首地址显然不是起始段,而是 GOT 表的相关数据段。因此误认为加载地址造成错误。
为什么会产生这个结果?
matlab
00000072'0b9e9000-00000072'0b9e9fff rw- 0 1000 [anon:.bss]
00000072'0ba44000-00000072'0ba46fff --- 0 3000 [page size compat]
00000072'0baa0000-00000072'0baa2fff --- 0 3000 [page size compat]
00000072'0baa3000-00000072'0baa6fff r-- 7c000 4000 /apex/com.android.art/lib64/libperfetto_hprof.so (BuildId: c72bca3d774db3e436079bde6f091204)
00000072'0baa7000-00000072'0baa7fff rw- 80000 1000 /apex/com.android.art/lib64/libperfetto_hprof.so (BuildId: c72bca3d774db3e436079bde6f091204)
我们可以看到 tombstone 文件中记录 /proc/self/maps 的内容,可以看到 VMA 被切割部分没有输出。
matlab
[720ba27000, 720ba47000) 这个是原生 Core 通过遍历 vma struct 获得的。
这个是开启 pgsize_migration_enabled 功能将 VMA 进行分离,然而此处却缺少了主体。
00000072'0ba44000-00000072'0ba46fff --- 0 3000 [page size compat]
bash
缺少的主体应该为:
00000072'0ba27000-00000072'0ba43fff r-- 0 1d000 /apex/com.android.art/lib64/libperfetto_hprof.so
因此主要问题在于读取节点 /proc/self/maps 的内容缺失导致的。
始作俑者
事态紧急,核心原因也比较清晰,也就没必要触发 Ramdump 从内核态中核实 vma 的情况,通过 code review 排查近几天内核关于 vm_struct,task_mmu,page_size_compat 等修改记录即可。
ruby
https://android.googlesource.com/kernel/common/+log/refs/heads/android15-6.6-2024-07
含有以下两笔修改:
ANDROID: 16K: Fixup padding vm_flags bits on VMA splits
https://android-review.googlesource.com/c/kernel/common/+/3279994
ANDROID: 16K: Introduce pgsize_migration_inline.h
https://android-review.googlesource.com/c/kernel/common/+/3280812/2
根本原因
之后通知了其它厂商的工程师回退此版本的 GKI,经过测试和其它信息核实发现后面的版本是正常的。也就是 7r40 上缺少相关的补丁,Google 的工程师核查结果发现确实在 7r40 上缺少了相关补丁。7r40 缺少以下任一修改,满足其一即可解决问题,因此 7r40 ~ 7r42 版本才存在这个风险。
ruby
ANDROID: 16K: [s]maps: Fold fixup entries into the parent entry
https://android-review.googlesource.com/c/kernel/common/+/3198870
ANDROID: 16K: Fix vm_flags conflicts from mseal
https://android-review.googlesource.com/c/kernel/common/+/3345340
快速校验
shell
# ./data/core-parser -p `pidof system_server`
core-parser> map
出现该警告意味着在 Core 文件中找到该地址所在 VMA 段。也就是发生了 VMA 缺失问题,同时存在 App 程序类似 libDexHelper.so 查找某符号偏移地址时,正巧在警告处的库文件中,那么就出现同样的问题。