问题描述
在新设备上部署VideoPipe时,CMake编译好运行中途经常遇到LLVM ERROR: out of memory
的报错,
bash
[Thread 0x7ffcd097f700 (LWP 9673) exited]
LLVM ERROR: out of memory
Thread 38 "trt_yolov8_samp" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff7afde700 (LWP 9352)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
且google搜索到的很多也是TensorRT、DeepStream相关的程序会遇到这个错误,但开发者们的描述也都是内存够用,还是out of memory,后来通过gdb调试定位到空指针解决。
错误原因
gdb bt返回的信息:
bash
#0 0x00007ffff6476e87 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff64787f1 in __GI_abort () at abort.c:79
#2 0x00007fffd755ecbb in () at /usr/local/tensorRT/lib/libnvinfer.so.8
#3 0x00007ffff6ad42ac in operator new(unsigned long) () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x000055555557ea7a in __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) (this=0x7fff7afd2c20, __n=3980232092549127)
at /usr/include/c++/7/ext/new_allocator.h:111
#5 0x000055555557cf8b in std::allocator_traits<std::allocator<char> >::allocate(std::allocator<char>&, unsigned long) (__a=..., __n=3980232092549127) at /usr/include/c++/7/bits/alloc_traits.h:436
#6 0x000055555557e9de in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long) (this=0x7fff7afd2c20, __capacity=@0x7fff7afd2a70: 3980232092549126, __old_capacity=0) at /usr/include/c++/7/bits/basic_string.tcc:153
#7 0x00007ffff78e096c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) (this=0x7fff7afd2c20, __beg=0x504047389 <error: Cannot access memory at address 0x504047389>, __end=0xe24050404738f <error: Cannot access memory at address 0xe24050404738f>) at /usr/include/c++/7/bits/basic_string.tcc:219
#8 0x00007ffff78ddd5e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct_aux<char*>(char*, char*, std::__false_type) (this=0x7fff7afd2c20, __beg=0x504047389 <error: Cannot access memory at address 0x504047389>, __end=0xe24050404738f <error: Cannot access memory at address 0xe24050404738f>) at /usr/include/c++/7/bits/basic_string.h:236
#9 0x00007ffff78dbd41 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*) (this=0x7fff7afd2c20, __beg=0x504047389 <error: Cannot access memory at address 0x504047389>, __end=0xe24050404738f <error: Cannot access memory at address 0xe24050404738f>) at /usr/include/c++/7/bits/basic_string.h:255
#10 0x00007ffff78d9b22 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (this=0x7fff7afd2c20, __str=<error: Cannot access memory at address 0x504047389>) at /usr/include/c++/7/bits/basic_string.h:440
#11 0x00007ffff79fa6f6 in vp_nodes::vp_trt_yolov8_detector::run_infer_combinations(std::vector<std::shared_ptr<vp_objects::vp_frame_meta>, std::allocator<std::shared_ptr<vp_objects::vp_frame_meta> > > const&) (this=0x555555fa1530, frame_meta_with_batch=std::vector of length 1, capacity 1 = {...}) at /home/ubuntu/yolov8n-trt-region-test/nodes/infers/vp_trt_yolov8_detector.cpp:53
#12 0x00007ffff7a42504 in vp_nodes::vp_infer_node::handle_frame_meta(std::shared_ptr<vp_objects::vp_frame_meta>) (this=0x555555fa1530, meta=std::shared_ptr<vp_objects::vp_frame_meta> (use count 8, weak count 0) = {...})
at /home/ubuntu/yolov8n-trt-region-test/nodes/vp_infer_node.cpp:66
#13 0x00007ffff7a471d4 in vp_nodes::vp_node::handle_run() (this=0x555555fa1530) at /home/ubuntu/yolov8n-trt-region-test/nodes/vp_node.cpp:45
#14 0x00007ffff7a4b215 in std::__invoke_impl<void, void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*>(std::__invoke_memfun_deref, void (vp_nodes::vp_node::*&&)(), vp_nodes::vp_node*&&) (__f=@0x5555b1e383a0: &virtual vp_nodes::vp_node::handle_run(), __t=@0x5555b1e38398: 0x555555fa1530) at /usr/include/c++/7/bits/invoke.h:73
#15 0x00007ffff7a4a146 in std::__invoke<void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*>(void (vp_nodes::vp_node::*&&)(), vp_nodes::vp_node*&&) (__fn=@0x5555b1e383a0: &virtual vp_nodes::vp_node::handle_run()) at /usr/include/c++/7/bits/invoke.h:95
#16 0x00007ffff7a4d30b in std::thread::_Invoker<std::tuple<void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=0x5555b1e38398) at /usr/include/c++/7/thread:234
#17 0x00007ffff7a4d2c1 in std::thread::_Invoker<std::tuple<void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*> >::operator()() (this=0x5555b1e38398) at /usr/include/c++/7/thread:243
#18 0x00007ffff7a4d2a0 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (vp_nodes::vp_node::*)(), vp_nodes::vp_node*> > >::_M_run() (this=0x5555b1e38390) at /usr/include/c++/7/thread:186
#19 0x00007ffff6afe6df in () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#20 0x00007ffff08516db in start_thread (arg=0x7fff7afde700) at pthread_create.c:463
#21 0x00007ffff655961f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
堆栈显示多个线程在操作推理管道:
vp_node::handle_run()
std::thread
相关操作 labels
容器可能在一个线程中被修改/销毁,而另一个线程正在使用,根本原因是悬空指针导致无效内存访问,触发了超大分配请求;
存在试图访问无效内存地址,在构造字符串时使用了野指针或已释放的内存。
CMake-gbd调试
【CMake】CMake从入门到实战系列(十一)------CMake支持gdb调试
CMake开启gdb调试
CMakeLists.txt中添加:
bash
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC -fdiagnostics-color=always -pthread")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -O0 -Wall -ggdb")
string(REPLACE "-w" "" CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}")
string(REPLACE "-g" "" CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}")
1、启动gdb:
2、b main
或break main
给main函数打上断点
3、r或run开始运行
4、
c
继续执行,会自动定位到报错所在位置
5、bt
查看堆栈具体信息, info stack
查看堆栈更详细的信息,分析报错原因。
我这里的主要问题是yolo_detector检测的代码在检测到目标但读取不到目标的标签时会导致空指针出现:
cpp
auto label = labels.size() == 0 ? "" : labels[objbox.class_id];
auto target = std::make_shared<vp_objects::vp_frame_target>(x, y, width, height,
objbox.class_id, objbox.conf, frame_meta->frame_index, frame_meta->channel_index, label);
上述代码可能存在的问题:
悬空指针风险:
当 labels.size() > 0 时,label 直接引用 labels[objbox.class_id] 的字符串
如果 labels 容器被修改(如元素删除/移动),引用会变为无效
索引越界:
objbox.class_id 可能超出 labels 的有效索引范围
当 class_id >= labels.size() 时,访问越界导致未定义行为
生命周期问题:
labels 容器可能在该行代码执行后被销毁或修改
多线程环境下,其他线程可能修改 labels 容器
修改后的代码:
cpp
// 安全做法:构造新的字符串副本
std::string safe_label = labels.empty()
? ""
: (objbox.class_id < labels.size()
? labels[objbox.class_id]
: "unknown"); // 处理越界情况
auto target = std::make_shared<vp_objects::vp_frame_target>(
x, y, width, height,
objbox.class_id, objbox.conf,
frame_meta->frame_index,
frame_meta->channel_index,
safe_label // 使用安全副本
);