逆向阿里ance系统迁移分析组件、追踪venv,cython生成的so库python执行

开始

ance 是 sysom 组件中检测系统状态的组件,ance 不开源、采用python编写,并且打包了python的虚拟环境到安装包中,使用python的虚拟环境解释器运行。

strace + ltrace 可解决大部分的跟踪ance实际读取了哪些文件,fork执行了哪些命令问题 下面的方法描述了跟踪ance的python执行,可以跟踪到执行时候的传参、可以方便定位ance崩溃时的问题

venv 解开限制

  • ance 的入口 [root@localhost]# which ance /usr/local/bin/ance

    shell 复制代码
    [root@localhost]# cat /usr/local/bin/ance
    #!/usr/local/.pyenv/versions/3.9.14/bin/python3
    # EASY-INSTALL-ENTRY-SCRIPT: 'ance==1.0.0','console_scripts','ance'
    import re
    import sys
  • 查看ance venv的python path变量 [root@localhost]# /usr/local/.pyenv/versions/3.9.14/bin/python3 Python 3.9.14 (main, Dec 19 2022, 10:18:44) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path ['', '/usr/local/.pyenv/versions/3.9.14/lib/python39.zip', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/lib-dynload', '/root/.local/lib/python3.9/site-packages', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages'] >>>

  • 替换python解释器 #!/bin/python3.9 import sys

    ini 复制代码
    sys.path = ['', '/usr/local/.pyenv/versions/3.9.14/lib/python39.zip', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/lib-dynload', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages']
    • 注:最好使用ance venv同版本的解释器
    • 解释器替换为其他解释器 + 修改 sys.path 到 ance venv 中的 path
  • 经过其他追踪方式,得知ance的运行方式,修改为以下 #!/bin/python3.9 import sys

    css 复制代码
    sys.path = ['', '/usr/local/.pyenv/versions/3.9.14/lib/python39.zip', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/lib-dynload', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages']
    
    from ance import main
    
    main.main()
    • 修改文件名为 ance 增加执行权限 ./ance -h 等与ance运行一致

    • 源 ance 的启动入口 if name == 'main': sys.argv[0] = re.sub(r'(-script.pyw?|.exe)?$', '', sys.argv[0]) sys.exit(load_entry_point('ance==1.0.0', 'console_scripts', 'ance')())

      • 启动入口在包的 setup.py 中进行配置,ance不开源,具体内容不知

至此,已替换为外部的python解释器,可以对python解释器修改,也可以在ance执行的前后增加内容

gdb

gdb 可以查看cpython中的python类型值

ini 复制代码
[root@localhost ance]# gdb /usr/bin/python3.9
(gdb) set args ./ance evaluate  --etype=os --os1=/ --os2=./Anolis_OS-8.6.x86_64.sqlite --limit=0
(gdb) b __pyx_pw_4ance_7scanner_6kernel_13KernelScanner_5scan
(gdb) r
Starting program: /usr/bin/python3.9 ./ance evaluate  --etype=os --os1=/ --os2=./Anolis_OS-8.6.x86_64.sqlite --limit=0
(gdb) bt

#0  __pyx_pw_4ance_7scanner_6kernel_13KernelScanner_5scan (
    __pyx_self=<cython_function_or_method at remote 0x7fffe3acf860>, 
    __pyx_args=(<KernelScanner(result_dir='/tmp/ance/results', mount_dir='/mnt/ance', config=None) at remote 0x7fffe373e070>, ['ksyscall', 'kcmdline', 'kconfig', 'kparams', 'kolist', 'kabi', 'os_metadata', 'os_service', 'os_syscmd', 'os_env', 'rpm', 'service', 'config', 'header', 'so', 'man', 'inst_rpmlist'], '5.10.134-13.an8.x86_64', '/'), 
    __pyx_kwds=0x0) at ance-0.1.1/ance/scanner/kernel.c:1941
#1  0x00007fffc06092d9 in __Pyx_PyObject_Call (kw=0x0, 
    arg=(<KernelScanner(result_dir='/tmp/ance/results', mount_dir='/mnt/ance', config=None) at remote 0x7fffe373e070>, ['ksyscall', 'kcmdline', 'kconfig', 'kparams', 'kolist', 'kabi', 'os_metadata', 'os_service', 'os_syscmd', 'os_env', 'rpm', 'service', 'config', 'header', 'so', 'man', 'inst_rpmlist'], '5.10.134-13.an8.x86_64', '/'), 
    func=<cython_function_or_method at remote 0x7fffe3acf860>) at ance-0.1.1/ance/collector/kernel.c:3420
#2  __pyx_pf_4ance_9collector_6kernel_15KernelCollector_4collect (__pyx_v_self=<optimized out>, 
    __pyx_v_subtypes=['ksyscall', 'kcmdline', 'kconfig', 'kparams', 'kolist', 'kabi', 'os_metadata', 'os_service', 'os_syscmd', 'os_env', 'rpm', 'service', 'config', 'header', 'so', 'man', 'inst_rpmlist'], __pyx_v_root='/', 
    __pyx_self=<optimized out>) at ance-0.1.1/ance/collector/kernel.c:2219
#3  0x00007fffc060bd2c in __pyx_pw_4ance_9collector_6kernel_15KernelCollector_5collect (__pyx_self=<optimized out>, 
    __pyx_args=(<KernelCollector(config=None) at remote 0x7fffe372d070>, ['ksyscall', 'kcmdline', 'kconfig', 'kparams', 'kolist', 'kabi', 'os_metadata', 'os_service', 'os_syscmd', 'os_env', 'rpm', 'service', 'config', 'header', 'so', 'man', 'inst_rpmlist'], '/'), __pyx_kwds=0x0) at ance-0.1.1/ance/collector/kernel.c:2065
#4  0x00007fffc0a0b008 in __Pyx_PyObject_Call (func=<cython_function_or_method at remote 0x7fffe37fa1e0>, 
    arg=arg@entry=(<KernelCollector(config=None) at remote 0x7fffe372d070>, ['ksyscall', 'kcmdline', 'kconfig', 'kparams', 'kolist', 'kabi', 'os_metadata', 'os_service', 'os_syscmd', 'os_env', 'rpm', 'service', 'config', 'header', 'so', 'man', 'inst_rpmlist'], '/'), kw=0x0) at ance-0.1.1/ance/collector/distro.c:9143
#5  0x00007fffc0a1842d in __pyx_pf_4ance_9collector_6distro_11OSCollector_12collect (__pyx_self=<optimized out>, 
    __pyx_v_limit=0, __pyx_v_root='/', __pyx_v_rpmlist='installed', __pyx_v_repo_path=<optimized out>, 
    __pyx_v_releasever=<optimized out>, __pyx_v_types=<optimized out>, 
    __pyx_v_self=<OSCollector(config=None, max_workers=8, rpm_collector=<RPMCollector(config=None, file_scanner=<FileScanner(result_dir='/tmp/ance/results', mount_dir='/mnt/ance', config=None) at remote 0x7fffe37839d0>) at remote 0x7fffe3783850>, rpm_scanner=<RPMScanner(result_dir='/tmp/ance/results', mount_dir='/mnt/ance', config=None, file_scanner=<FileScanner(result_dir='/tmp/ance/results', mount_dir='/mnt/ance', config=None) at remote 0x7fffe3783b50>) at remote 0x7fffe3783a60>) at remote 0x7fffe37838e0>) at ance-0.1.1/ance/collector/distro.c:5551
#6  __pyx_pw_4ance_9collector_6distro_11OSCollector_13collect (__pyx_self=<optimized out>, 
    __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at ance-0.1.1/ance/collector/distro.c:4748
#7  0x00007ffff76c1ac9 in _PyObject_MakeTpCall (tstate=0x555555606750, 
    callable=<cython_function_or_method at remote 0x7fffe37fa790>, args=0x7fffe3783960, nargs=1, 
    keywords=('types', 'rpmlist', 'repo_path', 'root', 'limit'))
    at /usr/src/debug/python39-3.9.16-2.0.2.an8.x86_64/Objects/call.c:194
#8  0x00007ffff76c4d23 in _PyObject_VectorcallTstate (tstate=0x555555606750, 
    callable=<cython_function_or_method at remote 0x7fffe37fa790>, args=0x7fffe3783960, nargsf=1, 
    kwnames=('types', 'rpmlist', 'repo_path', 'root', 'limit'))
--Type <RET> for more, q to quit, c to continue without paging--

__pyx_pw_4ance_7scanner_6kernel_13KernelScanner_5scan 是ance经过编译后的二进制so库中的一个函数名

如上,在调用 __pyx_pw_4ance_7scanner_6kernel_13KernelScanner_5scan 时候,堆栈中的参数是PyObject*类型,可以通过gdb查看值, gdb通过从cpython进程中拷贝一系列变量的内存到gdb进程中分析实现此功能,此功能效率极慢,加载完整一次调用栈可能就需要数十秒

gdb + python 拓展

gdb 对与断点过多或者断点附带复杂condition条件,处理的极为缓慢,单核心运行,也许执行几分钟,python还没开始执行。。。

  • 几个点
    • /usr/share/gdb/python/gdb gdb的python拓展
    • /etc/gdbinit gdb启动时执行的脚本
    • /etc/gdbinit.d/ gdb启动时加载的内容,可以是python的脚本,文件名为*.py
    • Python-3.9.16/Tools/gdb/libpython.py cpython源码项目中的gdb拓展,加载后可以使用py-bt查看python的栈帧等命令

使用gdb的python插件,可以实现静默对断点跟踪、执行

下面的方式因为处理极为缓慢,并没有实用,如果跟踪函数较少时可以使用

  • trace-ance.py 跟踪ance目录下so库中提供的函数

    python 复制代码
    import gdb
    
    import os
    import sys
    import subprocess
    from syslog import syslog, LOG_ERR
    
    ance_lib = "/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/"
    
    ance_sos = (
        "/root/ance/ance-vene-libs/collector/syscmd.cpython-39-x86_64-linux-gnu.so",
    )
    
    class Ance_BreakPoint(gdb.Breakpoint):
        # 存储栈调用
        stacks = list()
    
        def __init__(self, spec):
            super().__init__(spec)
            self.silent = False
            self.spec = spec
        
        def stop(self) -> bool:
            print(f"{self.spec}: {gdb.execute('info args', to_string=True)}\n", file=sys.stderr)
    
    
    def get_all_so(dir):
        # 收集目录下所有的.so文件
        sos = list()
        for root, dirs, files in os.walk(dir):
            for file in files:
                if file.endswith(".so"):
                    sos.append(os.path.join(root, file))
    
        return sos
    
    
    def subprocess_run(cmd):
        p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        out, err = p.communicate()
        if err:
            raise RuntimeError(str(err))
        return out
    
    
    def get_ance_so_bp_func(so):
        funcs = list()
        for line in subprocess_run(f"readelf -s -W {so}").decode('utf-8').splitlines():
            if not line:
                continue
            sysmbol = line.split()[-1]
            # print(sysmbol)
            if not 'ance' in sysmbol:
                continue
            if 'stance' in sysmbol:
                continue
            funcs.append(sysmbol)
        
        return funcs
    
    
    def breakpoint_at_ance(ance_lib):
        if ance_sos:
            sos = ance_sos
        else:
            sos = get_all_so(ance_lib)
    
        bps = dict()
    
        # 设置断点时,即使这个符号未加载,依然设置,不会被提问
        gdb.execute("set breakpoint pending on")
    
        for so in sos:
            funcs = get_ance_so_bp_func(so)
            for func in funcs:
                if func in bps:
                    continue
                # print(f"b {func} at {so.split('/')[-1]}")
                bps[func] = Ance_BreakPoint(func)
    
        gdb.execute("set breakpoint pending off")
    
        return bps
    
    gdb.execute("set args /root/ance/ance evaluate  --etype=os --os1=/ --os2=/root/ance/Anolis_OS-8.6.x86_64.sqlite --limit=0")
    
    bps = breakpoint_at_ance(ance_lib)
  • ln -s `pwd`/trace-ance.py /etc/gdbinit.d/ 使gdb启动时加载

  • gdb 启动 gdb /usr/bin/python3.9 2> ./ar 即可

    • 因为刚才的脚本只跟踪了一个so库
    • 如果跟踪所有so库,几分钟,python还没开始执行脚本。。。

预先加载二进制

LD_PRELOAD可以使得ld预先装载,ld在加载其他动态库前,先行装载指定的动态库,并在后续的符号查找中会从这里的定义动态库中查找,可以用来函数劫持,这里用来先行装载ance的所有动态库,这样使得需要debug的位置可以先行确定地址

bash 复制代码
export LD_PRELOAD=/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/main.cpython-39-x86_64-linux-gnu.so,\
/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/settings.cpython-39-x86_64-linux-gnu.so,\
/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/scanner/kabi.cpython-39-x86_64-linux-gnu.so,\
/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/scanner/iso.cpython-39-x86_64-linux-gnu.so,\
/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/scanner/header.cpython-39-x86_64-linux-gnu.so,\
/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/scanner/kolist.cpython-39-x86_64-linux-gnu.so,\
......

/root/ance/ance evaluate  --etype=hardware --os1=/ --os2=./Anolis_OS-8.6.x86_64.sqlite

ance是python库,python加载库的方式可能有其他动作或不一致,测试并不太正确

python执行ance前,手动import

python 复制代码
#!/bin/python3.9
import sys
import os
import time
import signal

sys.path = ['', '/usr/local/.pyenv/versions/3.9.14/lib/python39.zip', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/lib-dynload', '/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages']

# # 预先加载所有ance的动态库
from ance import main, settings
from ance.algo import base, compatibility, relatedness, similarity
from ance.collector import base as cole_base, distro, env, hardware, kcmdline, kernel, kparams, ksyscall, rpm, service, syscmd
from ance.compare import base as comp_base, distro as comp_distro, file as comp_file, kernel as comp_kernel, pcidev, requires, result
from ance.entity import const as ent_const, distro as ent_distro, file as ent_file, hardware as ent_hardware, iso, rpm as ent_rpm, service as ent_
from ance.scanner import abi, base as scan_base, config, file as scan_file, header, \
    iso as scan_iso, kabi, kconfig, kdriver, kernel, kolist, man , repo, rpm as scan_rpm, service as scan_service, so
from ance.utils import config as utils_config, extract as utils_extract, logger, md5, shell, yum

print(f'standby, pid = {os.getpid()}')

# 等待执行
# try:
#     while True:
#         time.sleep(1)
# except KeyboardInterrupt:
#     print('continue')
time.sleep(15)
main.main()

中间的time.sleep(15)留出时间,使得gdb附加到此进程中

gdb中调试使用的脚本

python 复制代码
import gdb

import os
import sys
import subprocess
from syslog import syslog, LOG_ERR

"""
ance_sos 如果只跟踪指定的 就设置这里 如果跟踪所有 注释

使用前需要先将需要的动态库装载
    LD_PRELOAD 貌似个别库会装载失败
    提前import ok
"""


ance_lib = "/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/"

# ance_sos = (
#     "/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/compare/pcidev.cpython-39-x86_64-linux-gnu.so",
# )

class Ance_BreakPoint(gdb.Breakpoint):
    # 存储栈调用
    stacks = list()

    def __init__(self, func, addr):
        super().__init__(f"*{addr}")
        self.silent = False
        self.func = func
    
    def stop(self) -> bool:
        print(f"{self.func}: {gdb.execute('info args', to_string=True)}\n", file=sys.stderr)


def get_all_so(dir):
    # 收集目录下所有的.so文件
    sos = list()
    for root, dirs, files in os.walk(dir):
        for file in files:
            if file.endswith(".so") and not file.startswith('_'):
                sos.append(os.path.join(root, file))

    return sos


def subprocess_run(cmd):
    p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = p.communicate()
    if err:
        raise RuntimeError(str(err))
    return out


def get_ance_so_bp_func(so):
    funcs = list()
    for line in subprocess_run(f"readelf -s -W {so}").decode('utf-8').splitlines():
        if not line:
            continue
        sysmbol = line.split()[-1]
        # print(sysmbol)
        if not 'ance' in sysmbol:
            continue
        if 'stance' in sysmbol or 'filelist' in sysmbol or 'requires' in sysmbol or 'filepath' in sysmbol:
            continue
        funcs.append(sysmbol)
    
    return funcs


def get_maps_elf_addr(pid, file):
    addr_range = 0
    with open(f"/proc/{pid}/maps", "r") as f:
        for line in f.readlines():
            if not line:
                continue
            
            elems = line.split()
            if not len(elems) == 6:
                continue

            (addr, permission, mapfile) = (elems[0], elems[1], elems[5])
            if not mapfile == file:
                continue

            # 带执行权限的才是
            if not 'x' in permission:
                continue

            addr_range = addr
            break
    
    if addr_range == 0:
        raise ValueError(f"find {file} in /proc/{pid}/maps false")
    return addr_range


def get_maps_elf_symbols(pid, file):
    symbols = list()

    try:
        addr_range = get_maps_elf_addr(pid, file)
    except ValueError as e:
        print(str(e), file=sys.stderr)
        return symbols

    addr_base = addr_range[:addr_range.index('-')]

    result = subprocess.run(f"readelf -s -W /proc/{pid}/map_files/{addr_range}", shell=True, stdout=subprocess.PIPE)
    for line in result.stdout.decode('utf-8').splitlines():
        if not line:
            continue
        elems = line.split()
        if not len(elems) == 8:
            continue

        saddr, ssize, stype, sbind, sname = elems[1], elems[2], elems[3], elems[4], elems[7]
        if not sbind == 'LOCAL' or not stype == 'FUNC' or ssize == '0':
            continue
        if not sname.startswith('__pyx'):
            continue

        symbols.append((sname, hex(int(addr_base, 16) + int(saddr, 16))))

    return symbols

def breakpoint_at_ance(sos):
    bps = dict()

    # 设置断点时,即使这个符号未加载,依然设置,不会被提问
    # 不再使用pending 改为直接在内存代码段中设置断点
    # gdb.execute("set breakpoint pending on")

    pid = gdb.selected_inferior().pid

    for so in sos:
        for (func, addr) in get_maps_elf_symbols(pid, so):
            if func in bps:
                continue
            print(f"b {addr} {func} at {so.split('/')[-1]}", file=sys.stderr)
            bps[func] = Ance_BreakPoint(func, addr)

    # gdb.execute("set breakpoint pending off")

    return bps

if 'ance_sos' in globals():
    sos = ance_sos
else:
    sos = get_all_so(ance_lib)


# 使用LD_PRELOAD 貌似会个别动态库未加载成功
# 即使想要使用LD_PRELOAD 也不可以在gdb中设置 因为设置的此时,不会加载这些库 依然需要在这些库加载后才可以找到映射
# gdb.execute(f"set environment LD_PRELOAD {','.join(sos)}")

# 采用外部进程先加载好库 不再使用lanuch模式 改为attach模式
# gdb.execute("set args /root/ance/ance evaluate  --etype=hardware --os1=/ --os2=./Anolis_OS-8.6.x86_64.sqlite")
# gdb.execute("set args /root/ance/ance")

bps = breakpoint_at_ance(sos)

脚本放置在/usr/share/gdb/python/中 gdb -p 附加到进程中 gdb中执行python import trace_ance

uprobe

使用bpftrace跟踪,对python解释器的实际执行,性能影响极小。需要先安装bpftrace

示例追踪了所有的ance下提供的ance相关函数,三百多个,不打印堆栈效率还不错,打印堆栈同样跟不上python的执行速度,也许gcc -fno-omit-frame-pointer有加速效果未尝试

  • 使用bpftrace单行程序模式,先通过python代码找到所有有价值的追踪函数,生成一个脚本

    python 复制代码
    #!/usr/bin/env python3
    import os
    import subprocess
    
    dir = "/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/"
    
    
    def get_all_so(dir):
        # 收集目录下所有的.so文件
        sos = list()
        for root, dirs, files in os.walk(dir):
            for file in files:
                if file.endswith(".so"):
                    sos.append(os.path.join(root, file))
    
        return sos
    
    
    def subprocess_run(cmd):
        p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        out, err = p.communicate()
        if err:
            raise RuntimeError(str(err))
        return out
    
    
    def get_ance_so_bp_func(so):
        funcs = list()
        for line in subprocess_run(f'bpftrace -l "uprobe:{so}:*"').decode('utf-8').splitlines():
            if not line:
                continue
    
            sysmbol = line.split(':')[-1]
            if not 'ance' in sysmbol:
                continue
            funcs.append(sysmbol)
        
        return funcs
    
    
    def breakpoint_at_ance(ance_lib):
        sos = get_all_so(ance_lib)
        bps = list()
    
        for so in sos:
            funcs = get_ance_so_bp_func(so)
            for func in funcs:
                if func in bps:
                    continue
                # print(f"b {func} at {so.split('/')[-1]}")
                bps.append((func, so))
    
        return bps
    
    bps = breakpoint_at_ance(dir)
    print(f"bps len = {len(bps)}")
    
    with open(os.path.join(os.getcwd(), 'ance-uprobe.bt'), 'w') as f:
        f.write("#!/bin/bpftrace\n")
        f.write("// 此文件来自于ance-uprobe.py的生成\n\n")
        for func, so in bps:
            f.write(f'uprobe:{so}:{func}\n' + '{' + '\n\tprintf("%s\\n\\t%s\\n", func, ustack());\n' + '}' + '\n\n')
    
    os.chmod(os.path.join(os.getcwd(), 'ance-uprobe.bt'), 0o755)
  • 生成的bpftrace脚本类似如下 #!/bin/bpftrace // 此文件来自于ance-uprobe.py的生成

    bash 复制代码
    uprobe:/usr/local/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ance/main.cpython-39-x86_64-linux-gnu.so:__pyx_gb_4ance_4main_11_get_ostype_2generator
    {
    	printf("%s\n\t%s\n", func, ustack());
    }
    
    ......
  • 跟踪 [localhost] # ./ance-uprobe.bt Attaching 347 probes... PyInit_ance

    markdown 复制代码
        PyInit_ance+0
        _imp_create_dynamic_impl+292
        _imp_create_dynamic+139
        cfunction_vectorcall_FASTCALL+170
        PyVectorcall_Call+451
        _PyObject_Call+64
        PyObject_Call.localalias.1993+53
        do_call_core+378
        0x7f2069dd517f
        _PyEval_EvalFrame.lto_priv.1267+50
        _PyEval_EvalCode+3423
    
    ......

当ance相关的so库退出时,堆栈将无法显示。。。。。。

usdt

使用默认的usdt

  • 查看python默认支持的usdt [root@localhost ance]# bpftrace -l "usdt:/usr/lib64/libpython3.9.so:*" usdt:/usr/lib64/libpython3.9.so:python:audit usdt:/usr/lib64/libpython3.9.so:python:call_func usdt:/usr/lib64/libpython3.9.so:python:frame_new_notrack usdt:/usr/lib64/libpython3.9.so:python:function__entry usdt:/usr/lib64/libpython3.9.so:python:function__return usdt:/usr/lib64/libpython3.9.so:python:gc__done usdt:/usr/lib64/libpython3.9.so:python:gc__start usdt:/usr/lib64/libpython3.9.so:python:import__find__load__done usdt:/usr/lib64/libpython3.9.so:python:import__find__load__start usdt:/usr/lib64/libpython3.9.so:python:line usdt:/usr/lib64/libpython3.9.so:python:pyobject_callobject

    • python源码编译时需要带上 -DWITH-DTRACE 相关参数
  • usdt 函数参数是什么 /root/rpmbuild/BUILD/Python-3.9.16/Include/pydtrace.d

    arduino 复制代码
    probe function__entry(const char *, const char *, int);
    • 这个文件里定义了usdt原型
    • 这里定义的原型 会被生成到 /root/rpmbuild/BUILD/Python-3.9.16/build/optimized/Include/pydtrace_probes.h
    • 再具体的传参,可以搜索pydtrace_probes.h中的宏定义在哪里进行了使用
  • 跟踪函数执行 #!/usr/bin/bpftrace

    perl 复制代码
    /**
    * PyDTrace_FUNCTION_ENTRY(filename, funcname, lineno);
    *
    * *(uint8*)(arg0) != 60 60是accii '<'
    *    以 < 开头的filename是内建类型 忽略
    * *(uint8*)(arg0+48) == 115 
    * 115 是 ascii 's'
    *   用来过滤/usr/local/.pyenv/versions/3.9.14/lib/python3.9/s 目录
    */
    
    usdt:/usr/lib64/libpython3.9.so:python:function__entry
    / *(uint8*)(arg0) != 60 /
    {
        printf("%s[%d]: %s\n", str(arg0), arg2, str(arg1))
    }
    • function__entry 位于python执行调用时生成栈帧的位置
    • ance的部分因为转c后可能不使用默认的生成python调用栈帧位置,所以这里看不到ance的调用,只能看到ance调用了其他库时候其他库的使用

cpython 增加 修改 usdt

以增加function__entry显示python栈的深度参数为例,新增usdt桩同理

  • 修改 /root/rpmbuild/BUILD/Python-3.9.16/Include/pydtrace.d probe function__entry(int, const char *, const char *, int);

    • 开头增加了一个int类型
    • 这个文件是定义参数、函数的,Makefile会根据这个文件生成pydtrace_probes.h文件,这个文件才是实际使用到的
  • 修改 /root/rpmbuild/BUILD/Python-3.9.16/Include/pydtrace.h static inline void PyDTrace_FUNCTION_ENTRY(int arg0, const char *arg1, const char *arg2, int arg3) {}

    • 当不使用dtrace时候,这里预留空的定义
  • 修改实际代码 /root/rpmbuild/BUILD/Python-3.9.16/Python/ceval.c:5752

    ini 复制代码
    static void
    dtrace_function_entry(PyFrameObject *f)
    {
        const char *filename;
        const char *funcname;
        int lineno;
    
        PyCodeObject *code = f->f_code;
        filename = PyUnicode_AsUTF8(code->co_filename);
        funcname = PyUnicode_AsUTF8(code->co_name);
        lineno = PyCode_Addr2Line(code, f->f_lasti);
    
        int stack_deep = 0;
        PyFrameObject* fnode = f;
        if (fnode != NULL)
            while ((fnode = fnode->f_back) != NULL)
            {
                stack_deep++;
            }
    
        PyDTrace_FUNCTION_ENTRY(stack_deep, filename, funcname, lineno);
    }

重新编译、执行即可

  • 其他有参考价值的usdt点位置
  • /root/rpmbuild/BUILD/Python-3.9.16/Python/ceval.c:5193 static inline PyObject * _Py_HOT_FUNCTION call_function(PyFrameObject *f, PyThreadState *tstate, PyObject ***pp_stack, Py_ssize_t oparg, PyObject *kwnames)
    • python 函数调用的位置, ance也会调用
  • /root/rpmbuild/BUILD/Python-3.9.16/Objects/frameobject.c PyFrameObject* _Py_HOT_FUNCTION _PyFrame_New_NoTrack(PyThreadState *tstate, PyCodeObject *code, PyObject *globals, PyObject *locals) {
    • 另一个生成python栈帧的地方,ance也会调用

总结

  • gdb的特点

    • 在cpython的调试中可以直接解释好python对象,非常直观
    • 可以追踪ance的so库函数调用
  • bpf uprobe

    • 几乎不影响ance的执行速度
    • 可以追踪ance的so库函数调用
    • ance进程的so库映射退出时,栈追踪不到
  • usdt

    • 几乎不影响执行速度
    • 可以在cpython解释器中定义好,方便转换PyObject等类型
    • 极为严谨的最好不调用python的一些会导致引用加减的方法(会限制使用已经存在的打印类的函数)
      • 意思时,即使可以在cpython中进行对python类解释,也最好自己写解释的方法,而且不要影响python的行为,不要调用python的行为
      • 否则容易python dump
  • 简易需求,使用strace+ltrace即可 (实用

  • gdb给ance函数断点,查看参数(实用

  • 几种综合,分析ance的行为

相关推荐
mit6.824几秒前
[Redis#3] 通用命令 | 数据类型 | 内部编码 | 单线程 | 快的原因
linux·redis·分布式
平头哥在等你12 分钟前
python特殊字符序列
开发语言·python·正则表达式
^Lim16 分钟前
esp32 JTAG 串口 bootload升级
java·linux·网络
小林熬夜学编程40 分钟前
【Linux系统编程】第五十弹---构建高效单例模式线程池、详解线程安全与可重入性、解析死锁与避免策略,以及STL与智能指针的线程安全性探究
linux·运维·服务器·c语言·c++·安全·单例模式
玖玖玖 柒染41 分钟前
windows下sqlplus连接到linux oracle不成功
linux·windows·oracle
LuckyTHP42 分钟前
CentOS 9 无法启动急救方法
linux·运维·centos
Bonne journée43 分钟前
centos和ubuntu有什么区别?
linux·ubuntu·centos
vvw&44 分钟前
如何在 Ubuntu 22.04 上安装带有 Nginx 的 ELK Stack
linux·运维·nginx·ubuntu·elk·elasticsearch·开源项目
Linux运维老纪1 小时前
交换机配置从IP(Switch Configuration from IP)
linux·服务器·网络·安全·运维开发·ip
OneSea1 小时前
Debian源码管理
linux