CMake-gdb调试,解决LLVM ERROR: out of memory

问题描述

在新设备上部署VideoPipe时,CMake编译好运行中途经常遇到LLVM ERROR: out of memory的报错,

bash 复制代码
[Thread 0x7ffcd097f700 (LWP 9673) exited]
LLVM ERROR: out of memory

Thread 38 "trt_yolov8_samp" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff7afde700 (LWP 9352)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

且google搜索到的很多也是TensorRT、DeepStream相关的程序会遇到这个错误,但开发者们的描述也都是内存够用,还是out of memory,后来通过gdb调试定位到空指针解决。

错误原因

gdb bt返回的信息:

bash 复制代码
#0  0x00007ffff6476e87 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff64787f1 in __GI_abort () at abort.c:79
#2  0x00007fffd755ecbb in  () at /usr/local/tensorRT/lib/libnvinfer.so.8
#3  0x00007ffff6ad42ac in operator new(unsigned long) () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x000055555557ea7a in __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) (this=0x7fff7afd2c20, __n=3980232092549127)
    at /usr/include/c++/7/ext/new_allocator.h:111
#5  0x000055555557cf8b in std::allocator_traits<std::allocator<char> >::allocate(std::allocator<char>&, unsigned long) (__a=..., __n=3980232092549127) at /usr/include/c++/7/bits/alloc_traits.h:436
#6  0x000055555557e9de in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long) (this=0x7fff7afd2c20, __capacity=@0x7fff7afd2a70: 3980232092549126, __old_capacity=0) at /usr/include/c++/7/bits/basic_string.tcc:153
#7  0x00007ffff78e096c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) (this=0x7fff7afd2c20, __beg=0x504047389 <error: Cannot access memory at address 0x504047389>, __end=0xe24050404738f <error: Cannot access memory at address 0xe24050404738f>) at /usr/include/c++/7/bits/basic_string.tcc:219
#8  0x00007ffff78ddd5e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct_aux<char*>(char*, char*, std::__false_type) (this=0x7fff7afd2c20, __beg=0x504047389 <error: Cannot access memory at address 0x504047389>, __end=0xe24050404738f <error: Cannot access memory at address 0xe24050404738f>) at /usr/include/c++/7/bits/basic_string.h:236
#9  0x00007ffff78dbd41 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*) (this=0x7fff7afd2c20, __beg=0x504047389 <error: Cannot access memory at address 0x504047389>, __end=0xe24050404738f <error: Cannot access memory at address 0xe24050404738f>) at /usr/include/c++/7/bits/basic_string.h:255
#10 0x00007ffff78d9b22 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (this=0x7fff7afd2c20, __str=<error: Cannot access memory at address 0x504047389>) at /usr/include/c++/7/bits/basic_string.h:440
#11 0x00007ffff79fa6f6 in vp_nodes::vp_trt_yolov8_detector::run_infer_combinations(std::vector<std::shared_ptr<vp_objects::vp_frame_meta>, std::allocator<std::shared_ptr<vp_objects::vp_frame_meta> > > const&) (this=0x555555fa1530, frame_meta_with_batch=std::vector of length 1, capacity 1 = {...}) at /home/ubuntu/yolov8n-trt-region-test/nodes/infers/vp_trt_yolov8_detector.cpp:53
#12 0x00007ffff7a42504 in vp_nodes::vp_infer_node::handle_frame_meta(std::shared_ptr<vp_objects::vp_frame_meta>) (this=0x555555fa1530, meta=std::shared_ptr<vp_objects::vp_frame_meta> (use count 8, weak count 0) = {...})
    at /home/ubuntu/yolov8n-trt-region-test/nodes/vp_infer_node.cpp:66
#13 0x00007ffff7a471d4 in vp_nodes::vp_node::handle_run() (this=0x555555fa1530) at /home/ubuntu/yolov8n-trt-region-test/nodes/vp_node.cpp:45
#14 0x00007ffff7a4b215 in std::__invoke_impl<void, void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*>(std::__invoke_memfun_deref, void (vp_nodes::vp_node::*&&)(), vp_nodes::vp_node*&&) (__f=@0x5555b1e383a0: &virtual vp_nodes::vp_node::handle_run(), __t=@0x5555b1e38398: 0x555555fa1530) at /usr/include/c++/7/bits/invoke.h:73
#15 0x00007ffff7a4a146 in std::__invoke<void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*>(void (vp_nodes::vp_node::*&&)(), vp_nodes::vp_node*&&) (__fn=@0x5555b1e383a0: &virtual vp_nodes::vp_node::handle_run()) at /usr/include/c++/7/bits/invoke.h:95
#16 0x00007ffff7a4d30b in std::thread::_Invoker<std::tuple<void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=0x5555b1e38398) at /usr/include/c++/7/thread:234
#17 0x00007ffff7a4d2c1 in std::thread::_Invoker<std::tuple<void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*> >::operator()() (this=0x5555b1e38398) at /usr/include/c++/7/thread:243
#18 0x00007ffff7a4d2a0 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*> > >::_M_run() (this=0x5555b1e38390) at /usr/include/c++/7/thread:186
#19 0x00007ffff6afe6df in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#20 0x00007ffff08516db in start_thread (arg=0x7fff7afde700) at pthread_create.c:463
#21 0x00007ffff655961f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

堆栈显示多个线程在操作推理管道:
vp_node::handle_run() std::thread 相关操作 labels 容器可能在一个线程中被修改/销毁,而另一个线程正在使用,根本原因是悬空指针导致无效内存访问,触发了超大分配请求;

存在试图访问无效内存地址,在构造字符串时使用了野指针或已释放的内存。

CMake-gbd调试

【CMake】CMake从入门到实战系列(十一)------CMake支持gdb调试

CMake开启gdb调试

CMakeLists.txt中添加:

bash 复制代码
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC -fdiagnostics-color=always -pthread")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -O0 -Wall -ggdb")

string(REPLACE "-w" "" CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}")
string(REPLACE "-g" "" CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}")

1、启动gdb:

2、b mainbreak main给main函数打上断点

3、r或run开始运行
4、c 继续执行,会自动定位到报错所在位置

5、bt 查看堆栈具体信息, info stack查看堆栈更详细的信息,分析报错原因。

我这里的主要问题是yolo_detector检测的代码在检测到目标但读取不到目标的标签时会导致空指针出现:

cpp 复制代码
auto label = labels.size() == 0 ? "" : labels[objbox.class_id];
            auto target = std::make_shared<vp_objects::vp_frame_target>(x, y, width, height, 
                                                                        objbox.class_id, objbox.conf, frame_meta->frame_index, frame_meta->channel_index, label);

上述代码可能存在的问题:

复制代码
悬空指针风险:

    当 labels.size() > 0 时,label 直接引用 labels[objbox.class_id] 的字符串

    如果 labels 容器被修改(如元素删除/移动),引用会变为无效

索引越界:

    objbox.class_id 可能超出 labels 的有效索引范围

    当 class_id >= labels.size() 时,访问越界导致未定义行为

生命周期问题:

    labels 容器可能在该行代码执行后被销毁或修改

    多线程环境下,其他线程可能修改 labels 容器

修改后的代码:

cpp 复制代码
// 安全做法:构造新的字符串副本
std::string safe_label = labels.empty() 
    ? "" 
    : (objbox.class_id < labels.size() 
        ? labels[objbox.class_id] 
        : "unknown");  // 处理越界情况

auto target = std::make_shared<vp_objects::vp_frame_target>(
    x, y, width, height, 
    objbox.class_id, objbox.conf, 
    frame_meta->frame_index, 
    frame_meta->channel_index, 
    safe_label  // 使用安全副本
);
相关推荐
深耕AI7 小时前
【MFC中OnInitDialog虚函数详解:哪个是虚函数?两个OnInitDialog的关系】
c++·mfc
CHANG_THE_WORLD7 小时前
并发编程指南 同步操作与强制排序
开发语言·c++·算法
pl00207 小时前
C++虚函数&虚析构函数&纯虚函数的使用说明和理解
c++·虚函数·纯虚函数·虚析构函数
小wanga10 小时前
C++知识
java·开发语言·c++
深思慎考10 小时前
LinuxC++项目开发日志——高并发内存池(1-定长内存池)
linux·c++
木心爱编程10 小时前
C++容器内存布局与性能优化指南
开发语言·c++·性能优化
咔咔咔的10 小时前
3446. 按对角线进行矩阵排序
c++
芒果敲代码11 小时前
什么是交叉编译?
c++
Qiang_san12 小时前
C++11新特性 | 欢迎来到现代C++的世界!
开发语言·c++
要做朋鱼燕12 小时前
【C++】迭代器详解与失效机制
开发语言·c++·算法