ubuntu 20.04 搭建crash dump问题分析环境
- [1 安装依赖软件](#1 安装依赖软件)
-
- [1.1 linux-dump](#1.1 linux-dump)
- [1.2 kexec-tools](#1.2 kexec-tools)
- [1.3 安装crash工具](#1.3 安装crash工具)
- [1.4 安装gdb调试工具](#1.4 安装gdb调试工具)
- [1.5 安装ubuntu内核调试符号](#1.5 安装ubuntu内核调试符号)
-
- [1.5.1 GPG 秘钥导入](#1.5.1 GPG 秘钥导入)
- [1.5.2 添加仓库配置](#1.5.2 添加仓库配置)
- [1.5.3 更新软件包](#1.5.3 更新软件包)
- [1.5.4 下载和安装内核调试符号](#1.5.4 下载和安装内核调试符号)
- [1.5.5 验证内核调试符号已经被安装](#1.5.5 验证内核调试符号已经被安装)
- [1.6 配置转储内存大小](#1.6 配置转储内存大小)
- [2 触发系统异常验证](#2 触发系统异常验证)
-
- [2.1 开启kdump服务](#2.1 开启kdump服务)
- [2.2 查看kdump服务当前的状态](#2.2 查看kdump服务当前的状态)
- [2.3 手动触发crash dump](#2.3 手动触发crash dump)
-
- [2.3.1 切换为 root 用户](#2.3.1 切换为 root 用户)
- [2.3.2 触发crash dump](#2.3.2 触发crash dump)
- [3 kernel dump文件分析](#3 kernel dump文件分析)
-
- [3.1 crash dump文件的位置](#3.1 crash dump文件的位置)
- [3.2 crash工具分析crash dump文件](#3.2 crash工具分析crash dump文件)
- [4 遇到的问题以及解决办法](#4 遇到的问题以及解决办法)
-
- [4.1 'makeinfo' is missing on your system](#4.1 'makeinfo' is missing on your system)
-
- [4.1.1 问题现象](#4.1.1 问题现象)
- [4.1.2 解决方法](#4.1.2 解决方法)
- [4.2 crash工具和主机安装的gdb版本不一致导致的异常](#4.2 crash工具和主机安装的gdb版本不一致导致的异常)
-
- [4.2.1 问题的现象](#4.2.1 问题的现象)
- [4.2.2 解决办法](#4.2.2 解决办法)
- [4.3 /dev/mem: Operation not permitted](#4.3 /dev/mem: Operation not permitted)
-
- [4.3.1 问题的现象](#4.3.1 问题的现象)
- [4.3.2 解决方法](#4.3.2 解决方法)
主机环境和内核版本信息:
bash
test@test:~/software/crash$ uname -r
5.15.0-74-generic
test@test:~/software/crash$
test@test:~/software/crash$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS"
test@test:~/software/crash$
1 安装依赖软件
参考了Ubuntu Kernel crash dump这篇文章
1.1 linux-dump
bash
sudo apt install linux-crashdump
1.2 kexec-tools
bash
sudo apt-get install kexec-tools
1.3 安装crash工具
bash
sudo apt install crash
1.4 安装gdb调试工具
bash
sudo apt-get install gdb
1.5 安装ubuntu内核调试符号
bash
sudo apt-get install linux-image-$(uname -r)-dbgsym
如果通过上面的命令安装ubuntu内核调试符号,则需要通过下面的方法去安装,参考文章:安装ubuntu内核调试符号
1.5.1 GPG 秘钥导入
确保您拥有系统的 GPG 密钥。适用于16.04 及更高版本的 Ubuntu :
bash
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C8CAB6595FDFF622
对于旧的发布版本:
bash
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys ECDCAD72428D7C0
1.5.2 添加仓库配置
bash
codename=$(lsb_release -c | awk '{print $2}')
sudo tee /etc/apt/sources.list.d/ddebs.list << EOF
deb http://ddebs.ubuntu.com/ ${codename} main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-security main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-proposed main restricted universe multiverse
EOF
1.5.3 更新软件包
bash
sudo apt-get update
1.5.4 下载和安装内核调试符号
bash
sudo apt-get install linux-image-$(uname -r)-dbgsym
1.5.5 验证内核调试符号已经被安装
包含调试信息的文件名为 vmlinux-XXX-debug
,其中 XXX 是内核版本。该文件存储在 /usr/lib/debug/boot
目录下。
1.6 配置转储内存大小
修改配置/etc/default/grub.d/kdump-tools.cfg
,以支持kernel crash dump
现场保存。
bash
GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=384M-:512M"
2 触发系统异常验证
2.1 开启kdump服务
bash
systemctl start kdump-tools-dump.service
systemctl enable kdump-tools-dump.service
2.2 查看kdump服务当前的状态
bash
test@test:~/software/crash$ service kdump-tools-dump status
● kdump-tools-dump.service - Kernel crash dump capture service
Loaded: loaded (/lib/systemd/system/kdump-tools-dump.service; static; vendor preset: enabled)
Active: active (exited) since Tue 2023-11-14 10:19:58 CST; 56s ago
Main PID: 126662 (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 37610)
Memory: 0B
CGroup: /system.slice/kdump-tools-dump.service
11月 14 10:19:58 test systemd[1]: Starting Kernel crash dump capture service...
11月 14 10:19:58 test kdump-tools[126662]: Starting kdump-tools:
11月 14 10:19:58 test kdump-tools[126669]: * Cannot change symbolic links when kdump is loaded
11月 14 10:19:58 test systemd[1]: Finished Kernel crash dump capture service.
test@test:~/software/crash$
2.3 手动触发crash dump
2.3.1 切换为 root 用户
bash
sudo -s
2.3.2 触发crash dump
bash
echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger
3 kernel dump文件分析
3.1 crash dump文件的位置
bash
test@test:~/software/crash$ ls /var/crash/
202311132045 kexec_cmd linux-image-5.15.0-74-generic-202311132357.crash _opt_kingsoft_wps-office_office6_wpscloudsvr.1000.uploaded _usr_bin_crash.0.uploaded
202311132231 linux-image-5.15.0-74-generic-202311132045.crash _opt_kingsoft_wps-office_office6_wpscloudsvr.1000.crash _usr_bin_crash.0.crash
202311132357 linux-image-5.15.0-74-generic-202311132231.crash _opt_kingsoft_wps-office_office6_wpscloudsvr.1000.upload _usr_bin_crash.0.upload
test@test:~/software/crash$
3.2 crash工具分析crash dump文件
bash
test@test:~/software/crash$ sudo crash -d /usr/bin/gdb /usr/lib/debug/boot/vmlinux-5.15.0-74-generic /var/crash/202311132357/dump.202311132357
crash 8.0.3++
Copyright (C) 2002-2022 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2022 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
Copyright (C) 2015, 2021 VMware, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
KERNEL: /usr/lib/debug/boot/vmlinux-5.15.0-74-generic
DUMPFILE: /var/crash/202311132357/dump.202311132357 [PARTIAL DUMP]
CPUS: 16
DATE: Mon Nov 13 23:56:58 CST 2023
UPTIME: 00:11:00
LOAD AVERAGE: 0.71, 0.84, 0.58
TASKS: 1636
NODENAME: test
RELEASE: 5.15.0-74-generic
VERSION: #81~20.04.2-Ubuntu SMP Fri May 26 19:56:20 UTC 2023
MACHINE: x86_64 (2900 Mhz)
MEMORY: 31.8 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash"
PID: 10269
COMMAND: "bash"
TASK: ffff89a4d1334d40 [THREAD_INFO: ffff89a4d1334d40]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 10269 TASK: ffff89a4d1334d40 CPU: 0 COMMAND: "bash"
#0 [ffff94d441397c48] machine_kexec at ffffffff9688afe0
#1 [ffff94d441397ca8] __crash_kexec at ffffffff96998c02
#2 [ffff94d441397d78] panic at ffffffff974cf4b6
#3 [ffff94d441397df8] sysrq_handle_crash at ffffffff96feea5a
#4 [ffff94d441397e08] __handle_sysrq.cold at ffffffff97524bf2
#5 [ffff94d441397e50] write_sysrq_trigger at ffffffff96fef548
#6 [ffff94d441397e68] proc_reg_write at ffffffff96c23b57
#7 [ffff94d441397e88] vfs_write at ffffffff96b83a76
#8 [ffff94d441397ec0] ksys_write at ffffffff96b85de7
#9 [ffff94d441397f00] __x64_sys_write at ffffffff96b85e8a
#10 [ffff94d441397f10] do_syscall_64 at ffffffff97571d39
#11 [ffff94d441397f28] do_syscall_64 at ffffffff97571d49
#12 [ffff94d441397f50] entry_SYSCALL_64_after_hwframe at ffffffff97600099
RIP: 00007fc202171077 RSP: 00007ffe9fcff018 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fc202171077
RDX: 0000000000000002 RSI: 000055c60be1a9e0 RDI: 0000000000000001
RBP: 000055c60be1a9e0 R8: 000000000000000a R9: 0000000000000001
R10: 000055c60b144017 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fc2022506a0 R14: 00007fc20224c4a0 R15: 00007fc20224b8a0
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash>
4 遇到的问题以及解决办法
4.1 'makeinfo' is missing on your system
4.1.1 问题现象
bash
/home/test/software/crash/gdb-10.2/missing: line 81: makeinfo: command not found
WARNING: 'makeinfo' is missing on your system.
You should only need it if you modified a '.texi' file, or
any other file indirectly affecting the aspect of the manual.
You might want to install the Texinfo package:
<http://www.gnu.org/software/texinfo/>
The spurious makeinfo call might also be the consequence of
using a buggy 'make' (AIX, DU, IRIX), in which case you might
want to install GNU make:
<http://www.gnu.org/software/make/>
make[5]: *** [Makefile:542: bfd.info] Error 127
make[4]: *** [Makefile:1643: info-recursive] Error 1
make[3]: *** [Makefile:2771: all-bfd] Error 2
make[3]: *** Waiting for unfinished jobs....
config.status: creating Makefile
config.status: creating import/Makefile
config.status: creating config.h
config.status: executing depfiles commands
config.status: executing default commands
make[2]: *** [Makefile:860: all] Error 2
crash build failed
make[1]: *** [Makefile:263: gdb_merge] Error 1
make: *** [Makefile:254: all] Error 2
4.1.2 解决方法
bash
sudo apt-get update
sudo apt-get install texinfo
4.2 crash工具和主机安装的gdb版本不一致导致的异常
4.2.1 问题的现象
bash
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: kernel relocated [344MB]: patching 145829 gdb minimal_symbol values
please wait... (patching 145829 gdb minimal_symbol values) Segmentation fault
test@test:~/software/crash$ gdb --version
GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
test@test:~/software/crash$
4.2.2 解决办法
bash
sudo apt purge crash
git clone https://github.com/crash-utility/crash.git
cd crash
make -j8
sudo make install
crash --version
4.3 /dev/mem: Operation not permitted
4.3.1 问题的现象
bash
sudo crash /usr/lib/debug/boot/vmlinux-5.15.0-74-generic
crash 7.2.8
Copyright (C) 2002-2020 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
crash: /dev/mem: Operation not permitted
4.3.2 解决方法
参考1.6节,修改/etc/default/grub.d/kdump-tools.cfg
文件,添加下面的配置以支持
bash
GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT` crashkernel=384M-:512M"
然后修改/dev/mem
的权限
bash
sudo chmod 777 /dev/mem