开始
ance 是 sysom 组件中检测系统状态的组件,ance 不开源、采用python编写,并且打包了python的虚拟环境到安装包中,使用python的虚拟环境解释器运行。
strace + ltrace 可解决大部分的跟踪ance实际读取了哪些文件,fork执行了哪些命令问题 下面的方法描述了跟踪ance的python执行,可以跟踪到执行时候的传参、可以方便定位ance崩溃时的问题
venv 解开限制
-
ance 的入口 [root@localhost]# which ance /usr/local/bin/ance
shell[root@localhost]# cat /usr/local/bin/ance #!/usr/local/.pyenv/versions/3.9.14/bin/python3 # EASY-INSTALL-ENTRY-SCRIPT: 'ance==1.0.0','console_scripts','ance' import re import sys
-
查看ance venv的python path变量 [root@localhost]# /usr/local/.pyenv/versions/3.9.14/bin/python3 Python 3.9.14 (main, Dec 19 2022, 10:18:44) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path ['', '/usr/local/.pyenv/versions/3.9.14/lib/python39.zip', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/lib-dynload', '/root/.local/lib/python3.9/site-packages', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages'] >>>
-
替换python解释器 #!/bin/python3.9 import sys
inisys.path = ['', '/usr/local/.pyenv/versions/3.9.14/lib/python39.zip', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/lib-dynload', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages']
- 注:最好使用ance venv同版本的解释器
- 解释器替换为其他解释器 + 修改 sys.path 到 ance venv 中的 path
-
经过其他追踪方式,得知ance的运行方式,修改为以下 #!/bin/python3.9 import sys
csssys.path = ['', '/usr/local/.pyenv/versions/3.9.14/lib/python39.zip', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/lib-dynload', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages'] from ance import main main.main()
-
修改文件名为 ance 增加执行权限 ./ance -h 等与ance运行一致
-
源 ance 的启动入口 if name == 'main': sys.argv[0] = re.sub(r'(-script.pyw?|.exe)?$', '', sys.argv[0]) sys.exit(load_entry_point('ance==1.0.0', 'console_scripts', 'ance')())
- 启动入口在包的 setup.py 中进行配置,ance不开源,具体内容不知
-
至此,已替换为外部的python解释器,可以对python解释器修改,也可以在ance执行的前后增加内容
gdb
gdb 可以查看cpython中的python类型值
ini
[root@localhost ance]# gdb /usr/bin/python3.9
(gdb) set args ./ance evaluate --etype=os --os1=/ --os2=./Anolis_OS-8.6.x86_64.sqlite --limit=0
(gdb) b __pyx_pw_4ance_7scanner_6kernel_13KernelScanner_5scan
(gdb) r
Starting program: /usr/bin/python3.9 ./ance evaluate --etype=os --os1=/ --os2=./Anolis_OS-8.6.x86_64.sqlite --limit=0
(gdb) bt
#0 __pyx_pw_4ance_7scanner_6kernel_13KernelScanner_5scan (
__pyx_self=<cython_function_or_method at remote 0x7fffe3acf860>,
__pyx_args=(<KernelScanner(result_dir='/tmp/ance/results', mount_dir='/mnt/ance', config=None) at remote 0x7fffe373e070>, ['ksyscall', 'kcmdline', 'kconfig', 'kparams', 'kolist', 'kabi', 'os_metadata', 'os_service', 'os_syscmd', 'os_env', 'rpm', 'service', 'config', 'header', 'so', 'man', 'inst_rpmlist'], '5.10.134-13.an8.x86_64', '/'),
__pyx_kwds=0x0) at ance-0.1.1/ance/scanner/kernel.c:1941
#1 0x00007fffc06092d9 in __Pyx_PyObject_Call (kw=0x0,
arg=(<KernelScanner(result_dir='/tmp/ance/results', mount_dir='/mnt/ance', config=None) at remote 0x7fffe373e070>, ['ksyscall', 'kcmdline', 'kconfig', 'kparams', 'kolist', 'kabi', 'os_metadata', 'os_service', 'os_syscmd', 'os_env', 'rpm', 'service', 'config', 'header', 'so', 'man', 'inst_rpmlist'], '5.10.134-13.an8.x86_64', '/'),
func=<cython_function_or_method at remote 0x7fffe3acf860>) at ance-0.1.1/ance/collector/kernel.c:3420
#2 __pyx_pf_4ance_9collector_6kernel_15KernelCollector_4collect (__pyx_v_self=<optimized out>,
__pyx_v_subtypes=['ksyscall', 'kcmdline', 'kconfig', 'kparams', 'kolist', 'kabi', 'os_metadata', 'os_service', 'os_syscmd', 'os_env', 'rpm', 'service', 'config', 'header', 'so', 'man', 'inst_rpmlist'], __pyx_v_root='/',
__pyx_self=<optimized out>) at ance-0.1.1/ance/collector/kernel.c:2219
#3 0x00007fffc060bd2c in __pyx_pw_4ance_9collector_6kernel_15KernelCollector_5collect (__pyx_self=<optimized out>,
__pyx_args=(<KernelCollector(config=None) at remote 0x7fffe372d070>, ['ksyscall', 'kcmdline', 'kconfig', 'kparams', 'kolist', 'kabi', 'os_metadata', 'os_service', 'os_syscmd', 'os_env', 'rpm', 'service', 'config', 'header', 'so', 'man', 'inst_rpmlist'], '/'), __pyx_kwds=0x0) at ance-0.1.1/ance/collector/kernel.c:2065
#4 0x00007fffc0a0b008 in __Pyx_PyObject_Call (func=<cython_function_or_method at remote 0x7fffe37fa1e0>,
arg=arg@entry=(<KernelCollector(config=None) at remote 0x7fffe372d070>, ['ksyscall', 'kcmdline', 'kconfig', 'kparams', 'kolist', 'kabi', 'os_metadata', 'os_service', 'os_syscmd', 'os_env', 'rpm', 'service', 'config', 'header', 'so', 'man', 'inst_rpmlist'], '/'), kw=0x0) at ance-0.1.1/ance/collector/distro.c:9143
#5 0x00007fffc0a1842d in __pyx_pf_4ance_9collector_6distro_11OSCollector_12collect (__pyx_self=<optimized out>,
__pyx_v_limit=0, __pyx_v_root='/', __pyx_v_rpmlist='installed', __pyx_v_repo_path=<optimized out>,
__pyx_v_releasever=<optimized out>, __pyx_v_types=<optimized out>,
__pyx_v_self=<OSCollector(config=None, max_workers=8, rpm_collector=<RPMCollector(config=None, file_scanner=<FileScanner(result_dir='/tmp/ance/results', mount_dir='/mnt/ance', config=None) at remote 0x7fffe37839d0>) at remote 0x7fffe3783850>, rpm_scanner=<RPMScanner(result_dir='/tmp/ance/results', mount_dir='/mnt/ance', config=None, file_scanner=<FileScanner(result_dir='/tmp/ance/results', mount_dir='/mnt/ance', config=None) at remote 0x7fffe3783b50>) at remote 0x7fffe3783a60>) at remote 0x7fffe37838e0>) at ance-0.1.1/ance/collector/distro.c:5551
#6 __pyx_pw_4ance_9collector_6distro_11OSCollector_13collect (__pyx_self=<optimized out>,
__pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at ance-0.1.1/ance/collector/distro.c:4748
#7 0x00007ffff76c1ac9 in _PyObject_MakeTpCall (tstate=0x555555606750,
callable=<cython_function_or_method at remote 0x7fffe37fa790>, args=0x7fffe3783960, nargs=1,
keywords=('types', 'rpmlist', 'repo_path', 'root', 'limit'))
at /usr/src/debug/python39-3.9.16-2.0.2.an8.x86_64/Objects/call.c:194
#8 0x00007ffff76c4d23 in _PyObject_VectorcallTstate (tstate=0x555555606750,
callable=<cython_function_or_method at remote 0x7fffe37fa790>, args=0x7fffe3783960, nargsf=1,
kwnames=('types', 'rpmlist', 'repo_path', 'root', 'limit'))
--Type <RET> for more, q to quit, c to continue without paging--
__pyx_pw_4ance_7scanner_6kernel_13KernelScanner_5scan 是ance经过编译后的二进制so库中的一个函数名
如上,在调用 __pyx_pw_4ance_7scanner_6kernel_13KernelScanner_5scan 时候,堆栈中的参数是PyObject*类型,可以通过gdb查看值, gdb通过从cpython进程中拷贝一系列变量的内存到gdb进程中分析实现此功能,此功能效率极慢,加载完整一次调用栈可能就需要数十秒
gdb + python 拓展
gdb 对与断点过多或者断点附带复杂condition条件,处理的极为缓慢,单核心运行,也许执行几分钟,python还没开始执行。。。
- 几个点
- /usr/share/gdb/python/gdb gdb的python拓展
- /etc/gdbinit gdb启动时执行的脚本
- /etc/gdbinit.d/ gdb启动时加载的内容,可以是python的脚本,文件名为*.py
- Python-3.9.16/Tools/gdb/libpython.py cpython源码项目中的gdb拓展,加载后可以使用
py-bt
查看python的栈帧等命令
使用gdb的python插件,可以实现静默对断点跟踪、执行
下面的方式因为处理极为缓慢,并没有实用,如果跟踪函数较少时可以使用
-
trace-ance.py 跟踪ance目录下so库中提供的函数
pythonimport gdb import os import sys import subprocess from syslog import syslog, LOG_ERR ance_lib = "/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/" ance_sos = ( "/root/ance/ance-vene-libs/collector/syscmd.cpython-39-x86_64-linux-gnu.so", ) class Ance_BreakPoint(gdb.Breakpoint): # 存储栈调用 stacks = list() def __init__(self, spec): super().__init__(spec) self.silent = False self.spec = spec def stop(self) -> bool: print(f"{self.spec}: {gdb.execute('info args', to_string=True)}\n", file=sys.stderr) def get_all_so(dir): # 收集目录下所有的.so文件 sos = list() for root, dirs, files in os.walk(dir): for file in files: if file.endswith(".so"): sos.append(os.path.join(root, file)) return sos def subprocess_run(cmd): p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) out, err = p.communicate() if err: raise RuntimeError(str(err)) return out def get_ance_so_bp_func(so): funcs = list() for line in subprocess_run(f"readelf -s -W {so}").decode('utf-8').splitlines(): if not line: continue sysmbol = line.split()[-1] # print(sysmbol) if not 'ance' in sysmbol: continue if 'stance' in sysmbol: continue funcs.append(sysmbol) return funcs def breakpoint_at_ance(ance_lib): if ance_sos: sos = ance_sos else: sos = get_all_so(ance_lib) bps = dict() # 设置断点时,即使这个符号未加载,依然设置,不会被提问 gdb.execute("set breakpoint pending on") for so in sos: funcs = get_ance_so_bp_func(so) for func in funcs: if func in bps: continue # print(f"b {func} at {so.split('/')[-1]}") bps[func] = Ance_BreakPoint(func) gdb.execute("set breakpoint pending off") return bps gdb.execute("set args /root/ance/ance evaluate --etype=os --os1=/ --os2=/root/ance/Anolis_OS-8.6.x86_64.sqlite --limit=0") bps = breakpoint_at_ance(ance_lib)
-
ln -s `pwd`/trace-ance.py /etc/gdbinit.d/
使gdb启动时加载 -
gdb 启动
gdb /usr/bin/python3.9 2> ./a
,r
即可- 因为刚才的脚本只跟踪了一个so库
- 如果跟踪所有so库,几分钟,python还没开始执行脚本。。。
预先加载二进制
LD_PRELOAD
可以使得ld预先装载,ld在加载其他动态库前,先行装载指定的动态库,并在后续的符号查找中会从这里的定义动态库中查找,可以用来函数劫持,这里用来先行装载ance的所有动态库,这样使得需要debug的位置可以先行确定地址
bash
export LD_PRELOAD=/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/main.cpython-39-x86_64-linux-gnu.so,\
/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/settings.cpython-39-x86_64-linux-gnu.so,\
/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/scanner/kabi.cpython-39-x86_64-linux-gnu.so,\
/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/scanner/iso.cpython-39-x86_64-linux-gnu.so,\
/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/scanner/header.cpython-39-x86_64-linux-gnu.so,\
/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/scanner/kolist.cpython-39-x86_64-linux-gnu.so,\
......
/root/ance/ance evaluate --etype=hardware --os1=/ --os2=./Anolis_OS-8.6.x86_64.sqlite
ance是python库,python加载库的方式可能有其他动作或不一致,测试并不太正确
python执行ance前,手动import
python
#!/bin/python3.9
import sys
import os
import time
import signal
sys.path = ['', '/usr/local/.pyenv/versions/3.9.14/lib/python39.zip', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/lib-dynload', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages']
# # 预先加载所有ance的动态库
from ance import main, settings
from ance.algo import base, compatibility, relatedness, similarity
from ance.collector import base as cole_base, distro, env, hardware, kcmdline, kernel, kparams, ksyscall, rpm, service, syscmd
from ance.compare import base as comp_base, distro as comp_distro, file as comp_file, kernel as comp_kernel, pcidev, requires, result
from ance.entity import const as ent_const, distro as ent_distro, file as ent_file, hardware as ent_hardware, iso, rpm as ent_rpm, service as ent_
from ance.scanner import abi, base as scan_base, config, file as scan_file, header, \
iso as scan_iso, kabi, kconfig, kdriver, kernel, kolist, man , repo, rpm as scan_rpm, service as scan_service, so
from ance.utils import config as utils_config, extract as utils_extract, logger, md5, shell, yum
print(f'standby, pid = {os.getpid()}')
# 等待执行
# try:
# while True:
# time.sleep(1)
# except KeyboardInterrupt:
# print('continue')
time.sleep(15)
main.main()
中间的time.sleep(15)
留出时间,使得gdb附加到此进程中
gdb中调试使用的脚本
python
import gdb
import os
import sys
import subprocess
from syslog import syslog, LOG_ERR
"""
ance_sos 如果只跟踪指定的 就设置这里 如果跟踪所有 注释
使用前需要先将需要的动态库装载
LD_PRELOAD 貌似个别库会装载失败
提前import ok
"""
ance_lib = "/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/"
# ance_sos = (
# "/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/compare/pcidev.cpython-39-x86_64-linux-gnu.so",
# )
class Ance_BreakPoint(gdb.Breakpoint):
# 存储栈调用
stacks = list()
def __init__(self, func, addr):
super().__init__(f"*{addr}")
self.silent = False
self.func = func
def stop(self) -> bool:
print(f"{self.func}: {gdb.execute('info args', to_string=True)}\n", file=sys.stderr)
def get_all_so(dir):
# 收集目录下所有的.so文件
sos = list()
for root, dirs, files in os.walk(dir):
for file in files:
if file.endswith(".so") and not file.startswith('_'):
sos.append(os.path.join(root, file))
return sos
def subprocess_run(cmd):
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
if err:
raise RuntimeError(str(err))
return out
def get_ance_so_bp_func(so):
funcs = list()
for line in subprocess_run(f"readelf -s -W {so}").decode('utf-8').splitlines():
if not line:
continue
sysmbol = line.split()[-1]
# print(sysmbol)
if not 'ance' in sysmbol:
continue
if 'stance' in sysmbol or 'filelist' in sysmbol or 'requires' in sysmbol or 'filepath' in sysmbol:
continue
funcs.append(sysmbol)
return funcs
def get_maps_elf_addr(pid, file):
addr_range = 0
with open(f"/proc/{pid}/maps", "r") as f:
for line in f.readlines():
if not line:
continue
elems = line.split()
if not len(elems) == 6:
continue
(addr, permission, mapfile) = (elems[0], elems[1], elems[5])
if not mapfile == file:
continue
# 带执行权限的才是
if not 'x' in permission:
continue
addr_range = addr
break
if addr_range == 0:
raise ValueError(f"find {file} in /proc/{pid}/maps false")
return addr_range
def get_maps_elf_symbols(pid, file):
symbols = list()
try:
addr_range = get_maps_elf_addr(pid, file)
except ValueError as e:
print(str(e), file=sys.stderr)
return symbols
addr_base = addr_range[:addr_range.index('-')]
result = subprocess.run(f"readelf -s -W /proc/{pid}/map_files/{addr_range}", shell=True, stdout=subprocess.PIPE)
for line in result.stdout.decode('utf-8').splitlines():
if not line:
continue
elems = line.split()
if not len(elems) == 8:
continue
saddr, ssize, stype, sbind, sname = elems[1], elems[2], elems[3], elems[4], elems[7]
if not sbind == 'LOCAL' or not stype == 'FUNC' or ssize == '0':
continue
if not sname.startswith('__pyx'):
continue
symbols.append((sname, hex(int(addr_base, 16) + int(saddr, 16))))
return symbols
def breakpoint_at_ance(sos):
bps = dict()
# 设置断点时,即使这个符号未加载,依然设置,不会被提问
# 不再使用pending 改为直接在内存代码段中设置断点
# gdb.execute("set breakpoint pending on")
pid = gdb.selected_inferior().pid
for so in sos:
for (func, addr) in get_maps_elf_symbols(pid, so):
if func in bps:
continue
print(f"b {addr} {func} at {so.split('/')[-1]}", file=sys.stderr)
bps[func] = Ance_BreakPoint(func, addr)
# gdb.execute("set breakpoint pending off")
return bps
if 'ance_sos' in globals():
sos = ance_sos
else:
sos = get_all_so(ance_lib)
# 使用LD_PRELOAD 貌似会个别动态库未加载成功
# 即使想要使用LD_PRELOAD 也不可以在gdb中设置 因为设置的此时,不会加载这些库 依然需要在这些库加载后才可以找到映射
# gdb.execute(f"set environment LD_PRELOAD {','.join(sos)}")
# 采用外部进程先加载好库 不再使用lanuch模式 改为attach模式
# gdb.execute("set args /root/ance/ance evaluate --etype=hardware --os1=/ --os2=./Anolis_OS-8.6.x86_64.sqlite")
# gdb.execute("set args /root/ance/ance")
bps = breakpoint_at_ance(sos)
脚本放置在/usr/share/gdb/python/
中 gdb -p 附加到进程中 gdb中执行python import trace_ance
uprobe
使用bpftrace跟踪,对python解释器的实际执行,性能影响极小。需要先安装bpftrace
示例追踪了所有的ance下提供的ance相关函数,三百多个,不打印堆栈效率还不错,打印堆栈同样跟不上python的执行速度,也许gcc -fno-omit-frame-pointer
有加速效果未尝试
-
使用bpftrace单行程序模式,先通过python代码找到所有有价值的追踪函数,生成一个脚本
python#!/usr/bin/env python3 import os import subprocess dir = "/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/" def get_all_so(dir): # 收集目录下所有的.so文件 sos = list() for root, dirs, files in os.walk(dir): for file in files: if file.endswith(".so"): sos.append(os.path.join(root, file)) return sos def subprocess_run(cmd): p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) out, err = p.communicate() if err: raise RuntimeError(str(err)) return out def get_ance_so_bp_func(so): funcs = list() for line in subprocess_run(f'bpftrace -l "uprobe:{so}:*"').decode('utf-8').splitlines(): if not line: continue sysmbol = line.split(':')[-1] if not 'ance' in sysmbol: continue funcs.append(sysmbol) return funcs def breakpoint_at_ance(ance_lib): sos = get_all_so(ance_lib) bps = list() for so in sos: funcs = get_ance_so_bp_func(so) for func in funcs: if func in bps: continue # print(f"b {func} at {so.split('/')[-1]}") bps.append((func, so)) return bps bps = breakpoint_at_ance(dir) print(f"bps len = {len(bps)}") with open(os.path.join(os.getcwd(), 'ance-uprobe.bt'), 'w') as f: f.write("#!/bin/bpftrace\n") f.write("// 此文件来自于ance-uprobe.py的生成\n\n") for func, so in bps: f.write(f'uprobe:{so}:{func}\n' + '{' + '\n\tprintf("%s\\n\\t%s\\n", func, ustack());\n' + '}' + '\n\n') os.chmod(os.path.join(os.getcwd(), 'ance-uprobe.bt'), 0o755)
-
生成的bpftrace脚本类似如下 #!/bin/bpftrace // 此文件来自于ance-uprobe.py的生成
bashuprobe:/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/main.cpython-39-x86_64-linux-gnu.so:__pyx_gb_4ance_4main_11_get_ostype_2generator { printf("%s\n\t%s\n", func, ustack()); } ......
-
跟踪 [localhost] # ./ance-uprobe.bt Attaching 347 probes... PyInit_ance
markdownPyInit_ance+0 _imp_create_dynamic_impl+292 _imp_create_dynamic+139 cfunction_vectorcall_FASTCALL+170 PyVectorcall_Call+451 _PyObject_Call+64 PyObject_Call.localalias.1993+53 do_call_core+378 0x7f2069dd517f _PyEval_EvalFrame.lto_priv.1267+50 _PyEval_EvalCode+3423 ......
当ance相关的so库退出时,堆栈将无法显示。。。。。。
usdt
使用默认的usdt
-
查看python默认支持的usdt [root@localhost ance]# bpftrace -l "usdt:/usr/lib64/libpython3.9.so:*" usdt:/usr/lib64/libpython3.9.so:python:audit usdt:/usr/lib64/libpython3.9.so:python:call_func usdt:/usr/lib64/libpython3.9.so:python:frame_new_notrack usdt:/usr/lib64/libpython3.9.so:python:function__entry usdt:/usr/lib64/libpython3.9.so:python:function__return usdt:/usr/lib64/libpython3.9.so:python:gc__done usdt:/usr/lib64/libpython3.9.so:python:gc__start usdt:/usr/lib64/libpython3.9.so:python:import__find__load__done usdt:/usr/lib64/libpython3.9.so:python:import__find__load__start usdt:/usr/lib64/libpython3.9.so:python:line usdt:/usr/lib64/libpython3.9.so:python:pyobject_callobject
- python源码编译时需要带上 -DWITH-DTRACE 相关参数
-
usdt 函数参数是什么 /root/rpmbuild/BUILD/Python-3.9.16/Include/pydtrace.d
arduinoprobe function__entry(const char *, const char *, int);
- 这个文件里定义了usdt原型
- 这里定义的原型 会被生成到
/root/rpmbuild/BUILD/Python-3.9.16/build/optimized/Include/pydtrace_probes.h
- 再具体的传参,可以搜索
pydtrace_probes.h
中的宏定义在哪里进行了使用
-
跟踪函数执行 #!/usr/bin/bpftrace
perl/** * PyDTrace_FUNCTION_ENTRY(filename, funcname, lineno); * * *(uint8*)(arg0) != 60 60是accii '<' * 以 < 开头的filename是内建类型 忽略 * *(uint8*)(arg0+48) == 115 * 115 是 ascii 's' * 用来过滤/usr/local/.pyenv/versions/3.9.14/lib/python3.9/s 目录 */ usdt:/usr/lib64/libpython3.9.so:python:function__entry / *(uint8*)(arg0) != 60 / { printf("%s[%d]: %s\n", str(arg0), arg2, str(arg1)) }
- function__entry 位于python执行调用时生成栈帧的位置
- ance的部分因为转c后可能不使用默认的生成python调用栈帧位置,所以这里看不到ance的调用,只能看到ance调用了其他库时候其他库的使用
cpython 增加 修改 usdt
以增加function__entry显示python栈的深度参数为例,新增usdt桩同理
-
修改 /root/rpmbuild/BUILD/Python-3.9.16/Include/pydtrace.d probe function__entry(int, const char *, const char *, int);
- 开头增加了一个int类型
- 这个文件是定义参数、函数的,Makefile会根据这个文件生成
pydtrace_probes.h
文件,这个文件才是实际使用到的
-
修改 /root/rpmbuild/BUILD/Python-3.9.16/Include/pydtrace.h static inline void PyDTrace_FUNCTION_ENTRY(int arg0, const char *arg1, const char *arg2, int arg3) {}
- 当不使用dtrace时候,这里预留空的定义
-
修改实际代码 /root/rpmbuild/BUILD/Python-3.9.16/Python/ceval.c:5752
inistatic void dtrace_function_entry(PyFrameObject *f) { const char *filename; const char *funcname; int lineno; PyCodeObject *code = f->f_code; filename = PyUnicode_AsUTF8(code->co_filename); funcname = PyUnicode_AsUTF8(code->co_name); lineno = PyCode_Addr2Line(code, f->f_lasti); int stack_deep = 0; PyFrameObject* fnode = f; if (fnode != NULL) while ((fnode = fnode->f_back) != NULL) { stack_deep++; } PyDTrace_FUNCTION_ENTRY(stack_deep, filename, funcname, lineno); }
重新编译、执行即可
- 其他有参考价值的usdt点位置
- /root/rpmbuild/BUILD/Python-3.9.16/Python/ceval.c:5193 static inline PyObject * _Py_HOT_FUNCTION call_function(PyFrameObject *f, PyThreadState *tstate, PyObject ***pp_stack, Py_ssize_t oparg, PyObject *kwnames)
- python 函数调用的位置, ance也会调用
- /root/rpmbuild/BUILD/Python-3.9.16/Objects/frameobject.c PyFrameObject* _Py_HOT_FUNCTION _PyFrame_New_NoTrack(PyThreadState *tstate, PyCodeObject *code, PyObject *globals, PyObject *locals) {
- 另一个生成python栈帧的地方,ance也会调用
总结
-
gdb的特点
- 在cpython的调试中可以直接解释好python对象,非常直观
- 可以追踪ance的so库函数调用
-
bpf uprobe
- 几乎不影响ance的执行速度
- 可以追踪ance的so库函数调用
- ance进程的so库映射退出时,栈追踪不到
-
usdt
- 几乎不影响执行速度
- 可以在cpython解释器中定义好,方便转换PyObject等类型
- 极为严谨的最好不调用python的一些会导致引用加减的方法(会限制使用已经存在的打印类的函数)
- 意思时,即使可以在cpython中进行对python类解释,也最好自己写解释的方法,而且不要影响python的行为,不要调用python的行为
- 否则容易python dump
-
简易需求,使用strace+ltrace即可 (实用)
-
gdb给ance函数断点,查看参数(实用)
-
几种综合,分析ance的行为