RK3588部署YOLOv8（2）：OpenCV和RGA实现模型前处理对比

前言

[1. 结果对比](#1. 结果对比)

[1.1 时间对比](#1.1 时间对比)

[1.2 CPU和NPU占用对比](#1.2 CPU和NPU占用对比)

[2. RGA实现YOLO前处理](#2. RGA实现YOLO前处理)

[2.1 实现思路](#2.1 实现思路)

[2.2 处理类的声明](#2.2 处理类的声明)

[2.3 处理类的实现](#2.3 处理类的实现)

总结

前言

RK平台上有RGA (Raster Graphic Acceleration Unit) 加速，使用RGA可以减少资源占用、加速图片处理速度。因此，在部署YOLOv8是针对RGA和OpenCV的分别进行了实现，并对性能、速度和资源占用进行对比。

1. 结果对比

1.1 时间对比

总共跑100次计算平均时间。

纯OpenCV实现：

TypeScript 复制代码

OPencv实现resize+pad
[Convert] Step1: Check input pointer => 0 us
[Convert] Step2: define intermediate Mat => 37 us
[Convert] Step3: cv::resize => 9564 us
[Convert] Step4: create pad_img => 1629 us
[Convert] Step5: compute position => 61 us
[Convert] Step6: copyTo => 340 us
[Convert] Step7: return => 54 us
INFO: image resize time 12.15 ms
INFO: total infer time 22.71 ms: model time is 22.45 ms and postprocess time is 0.26ms
Iteration 100 - time: 35.034000 ms
Total execution time: 3770.930000 ms
Average execution time: 37.709300 ms

纯RGA实现：

TypeScript 复制代码

RGA实现
[Convert] Step1: Check input pointer => 1 us
[Convert] Step2: Set format/bpp => 92 us
[Convert] Step3: Calculate buffer sizes => 13 us
[Convert] Step4: Define variables => 12 us
[Convert] Step5: Compute border => 13 us
[Convert] Step6: Alloc & memcpy src => 10048 us
[Convert] Step7: Alloc resized buffer => 477 us
[Convert] Step8: Alloc dst buffer => 269 us
[Convert] Step9: importbuffer_fd => 3494 us
[Convert] Step10: wrapbuffer_handle => 80 us
[Convert] Step11: imresize => 2714 us
[Convert] Step12: immakeBorder => 1154 us
[Convert] Step13: copy result => 428 us
[Convert] Step14: cleanup => 2607 us
INFO: image resize time 24.26 ms
INFO: total infer time 22.10 ms: model time is 21.84 ms and postprocess time is 0.26ms
Iteration 100 - time: 46.496000 ms
Total execution time: 4398.143000 ms
Average execution time: 43.981430 ms

总结：

（1）从上可以看到OpenCV最占时间的是resize步骤，需要9~10ms，而RGA只要2~3ms。

（2）但使用RGA，如果图片数据不在DMA缓冲区，则需要进行拷贝，导致耗时太久。

（3）最终，导致RGA实现resize要比OpenCV慢了6~7ms。

1.2 CPU和NPU占用对比

跑单个模型持续推理。

纯OpenCV实现：

CPU占用率：120%~140%

NPU占用率：43%~48%

纯RGA实现：

CPU占用率：50%~60%

NPU占用率：35%~40%

总结：

（1）OPenCV使用CPU多线程计算差值，导致CPU占用率较高。

（2）RGA在DMA缓冲区使用硬件计算，减少对CPU依赖。

（3）RGA比OpenCV减少了60%的CPU占用和10%的NPU占用。

2. RGA实现YOLO前处理

参考代码：