CMake-gdb调试,解决LLVM ERROR: out of memory

问题描述

在新设备上部署VideoPipe时,CMake编译好运行中途经常遇到LLVM ERROR: out of memory的报错,

bash 复制代码
[Thread 0x7ffcd097f700 (LWP 9673) exited]
LLVM ERROR: out of memory

Thread 38 "trt_yolov8_samp" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff7afde700 (LWP 9352)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

且google搜索到的很多也是TensorRT、DeepStream相关的程序会遇到这个错误,但开发者们的描述也都是内存够用,还是out of memory,后来通过gdb调试定位到空指针解决。

错误原因

gdb bt返回的信息:

bash 复制代码
#0  0x00007ffff6476e87 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff64787f1 in __GI_abort () at abort.c:79
#2  0x00007fffd755ecbb in  () at /usr/local/tensorRT/lib/libnvinfer.so.8
#3  0x00007ffff6ad42ac in operator new(unsigned long) () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x000055555557ea7a in __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) (this=0x7fff7afd2c20, __n=3980232092549127)
    at /usr/include/c++/7/ext/new_allocator.h:111
#5  0x000055555557cf8b in std::allocator_traits<std::allocator<char> >::allocate(std::allocator<char>&, unsigned long) (__a=..., __n=3980232092549127) at /usr/include/c++/7/bits/alloc_traits.h:436
#6  0x000055555557e9de in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long) (this=0x7fff7afd2c20, __capacity=@0x7fff7afd2a70: 3980232092549126, __old_capacity=0) at /usr/include/c++/7/bits/basic_string.tcc:153
#7  0x00007ffff78e096c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) (this=0x7fff7afd2c20, __beg=0x504047389 <error: Cannot access memory at address 0x504047389>, __end=0xe24050404738f <error: Cannot access memory at address 0xe24050404738f>) at /usr/include/c++/7/bits/basic_string.tcc:219
#8  0x00007ffff78ddd5e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct_aux<char*>(char*, char*, std::__false_type) (this=0x7fff7afd2c20, __beg=0x504047389 <error: Cannot access memory at address 0x504047389>, __end=0xe24050404738f <error: Cannot access memory at address 0xe24050404738f>) at /usr/include/c++/7/bits/basic_string.h:236
#9  0x00007ffff78dbd41 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*) (this=0x7fff7afd2c20, __beg=0x504047389 <error: Cannot access memory at address 0x504047389>, __end=0xe24050404738f <error: Cannot access memory at address 0xe24050404738f>) at /usr/include/c++/7/bits/basic_string.h:255
#10 0x00007ffff78d9b22 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (this=0x7fff7afd2c20, __str=<error: Cannot access memory at address 0x504047389>) at /usr/include/c++/7/bits/basic_string.h:440
#11 0x00007ffff79fa6f6 in vp_nodes::vp_trt_yolov8_detector::run_infer_combinations(std::vector<std::shared_ptr<vp_objects::vp_frame_meta>, std::allocator<std::shared_ptr<vp_objects::vp_frame_meta> > > const&) (this=0x555555fa1530, frame_meta_with_batch=std::vector of length 1, capacity 1 = {...}) at /home/ubuntu/yolov8n-trt-region-test/nodes/infers/vp_trt_yolov8_detector.cpp:53
#12 0x00007ffff7a42504 in vp_nodes::vp_infer_node::handle_frame_meta(std::shared_ptr<vp_objects::vp_frame_meta>) (this=0x555555fa1530, meta=std::shared_ptr<vp_objects::vp_frame_meta> (use count 8, weak count 0) = {...})
    at /home/ubuntu/yolov8n-trt-region-test/nodes/vp_infer_node.cpp:66
#13 0x00007ffff7a471d4 in vp_nodes::vp_node::handle_run() (this=0x555555fa1530) at /home/ubuntu/yolov8n-trt-region-test/nodes/vp_node.cpp:45
#14 0x00007ffff7a4b215 in std::__invoke_impl<void, void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*>(std::__invoke_memfun_deref, void (vp_nodes::vp_node::*&&)(), vp_nodes::vp_node*&&) (__f=@0x5555b1e383a0: &virtual vp_nodes::vp_node::handle_run(), __t=@0x5555b1e38398: 0x555555fa1530) at /usr/include/c++/7/bits/invoke.h:73
#15 0x00007ffff7a4a146 in std::__invoke<void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*>(void (vp_nodes::vp_node::*&&)(), vp_nodes::vp_node*&&) (__fn=@0x5555b1e383a0: &virtual vp_nodes::vp_node::handle_run()) at /usr/include/c++/7/bits/invoke.h:95
#16 0x00007ffff7a4d30b in std::thread::_Invoker<std::tuple<void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=0x5555b1e38398) at /usr/include/c++/7/thread:234
#17 0x00007ffff7a4d2c1 in std::thread::_Invoker<std::tuple<void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*> >::operator()() (this=0x5555b1e38398) at /usr/include/c++/7/thread:243
#18 0x00007ffff7a4d2a0 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*> > >::_M_run() (this=0x5555b1e38390) at /usr/include/c++/7/thread:186
#19 0x00007ffff6afe6df in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#20 0x00007ffff08516db in start_thread (arg=0x7fff7afde700) at pthread_create.c:463
#21 0x00007ffff655961f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

堆栈显示多个线程在操作推理管道:
vp_node::handle_run() std::thread 相关操作 labels 容器可能在一个线程中被修改/销毁,而另一个线程正在使用,根本原因是悬空指针导致无效内存访问,触发了超大分配请求;

存在试图访问无效内存地址,在构造字符串时使用了野指针或已释放的内存。

CMake-gbd调试

【CMake】CMake从入门到实战系列(十一)------CMake支持gdb调试

CMake开启gdb调试

CMakeLists.txt中添加:

bash 复制代码
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC -fdiagnostics-color=always -pthread")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -O0 -Wall -ggdb")

string(REPLACE "-w" "" CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}")
string(REPLACE "-g" "" CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}")

1、启动gdb:

2、b mainbreak main给main函数打上断点

3、r或run开始运行
4、c 继续执行,会自动定位到报错所在位置

5、bt 查看堆栈具体信息, info stack查看堆栈更详细的信息,分析报错原因。

我这里的主要问题是yolo_detector检测的代码在检测到目标但读取不到目标的标签时会导致空指针出现:

cpp 复制代码
auto label = labels.size() == 0 ? "" : labels[objbox.class_id];
            auto target = std::make_shared<vp_objects::vp_frame_target>(x, y, width, height, 
                                                                        objbox.class_id, objbox.conf, frame_meta->frame_index, frame_meta->channel_index, label);

上述代码可能存在的问题:

复制代码
悬空指针风险:

    当 labels.size() > 0 时,label 直接引用 labels[objbox.class_id] 的字符串

    如果 labels 容器被修改(如元素删除/移动),引用会变为无效

索引越界:

    objbox.class_id 可能超出 labels 的有效索引范围

    当 class_id >= labels.size() 时,访问越界导致未定义行为

生命周期问题:

    labels 容器可能在该行代码执行后被销毁或修改

    多线程环境下,其他线程可能修改 labels 容器

修改后的代码:

cpp 复制代码
// 安全做法:构造新的字符串副本
std::string safe_label = labels.empty() 
    ? "" 
    : (objbox.class_id < labels.size() 
        ? labels[objbox.class_id] 
        : "unknown");  // 处理越界情况

auto target = std::make_shared<vp_objects::vp_frame_target>(
    x, y, width, height, 
    objbox.class_id, objbox.conf, 
    frame_meta->frame_index, 
    frame_meta->channel_index, 
    safe_label  // 使用安全副本
);
相关推荐
闻缺陷则喜何志丹27 分钟前
【带权的并集查找】 P9235 [蓝桥杯 2023 省 A] 网络稳定性|省选-
数据结构·c++·蓝桥杯·洛谷·并集查找
EJoft35 分钟前
WCDB soci 查询语句
开发语言·c++
HHRL-yx1 小时前
C++网络编程 2.TCP套接字(socket)编程详解
网络·c++·tcp/ip
刚入坑的新人编程2 小时前
暑期算法训练.3
c++·算法
科大饭桶2 小时前
数据结构自学Day8: 堆的排序以及TopK问题
数据结构·c++·算法·leetcode·二叉树·c
minji...2 小时前
数据结构 栈(2)--栈的实现
开发语言·数据结构·c++·算法·链表
zh_xuan2 小时前
c++ 模板元编程
开发语言·c++·算法
猪蹄手3 小时前
单例模式详细讲解
开发语言·c++·单例模式
R-G-B5 小时前
【46】MFC入门到精通——MFC显示实时时间,获取系统当前时间GetCurrentTime()、获取本地时间GetLocalTime()
c++·mfc·mfc显示实时时间·mfc获取系统当前时间·getcurrenttime·getlocaltime