YOLOv8-Pose NCNN安卓部署

YOLOv8-Pose NCNN安卓部署

前言

YOLOv8-Pose NCNN安卓部署

目前的帧率可以稳定在30帧左右,下面是这个项目的github地址:https://github.com/gaoxumustwin/ncnn-android-yolov8-pose

介绍

在做YOLOv8-Pose NCNN安卓部署的时候,在github上发现了https://github.com/eecn/ncnn-android-yolov8-pose已经有大佬实现了这一操,并且也给出了它是如何导出ncnn模型的;

该作者是使用YOLOv8的官方给出的NCNN模型导出的方式,我按照这个方式导出ncnn模型,并观察其网络结构如下:

这个网络结构有一些错综复杂,感觉是有大量的计算图追踪,但是其安卓部署的时候速度在20帧左右也还可以接受,并且YOLOv8的官方给出了NCNN推理的python代码

https://github.com/ultralytics/ultralytics/tree/a007668e1fa8d5d586e6daa3924d65cfb139b8ac/examples/YOLOv8-NCNN-Python-Det-Pose-Cls-Seg-Obb

由于之前看到了三木君大佬,在23年的时候在知乎发布的一篇文章,只导出模型的Backbone + Neck,这种思路在玩瑞芯微设备的时候多次接触,三木君大佬也给出了原因:

  • 只导出模型,不导出后处理方便对模型进行公平的测速,MMYOLO 里面的算法多种多样,而且大多是 Backbone + Neck 结构,这样测速才能较为公平体现模型的速度差异。
  • 有些模型需要部署到嵌入式设备,导出后处理的话相当与会有海量的加减乘除要计算,由于模型应用会通过置信度过滤很多检测结果,会导致大多计算都是多余的。
  • 有些推理后端对后处理算子支持很差,可能会无法转换到对应后端的模型,或者转换后的模型无法充分利用 BPU/TPU/NPU/GPU 加速等等问题存在。
  • 很多要部署的模型都会用 c/c++ 重写后处理加速,后处理代码可以多线程计算,异步计算效率会更高。

同时在CSDN上发现了有佬根据上面的思路编写了NCNN代码并且给出了NCNN代码的github地址

此时只需要参照前面的yolov8安卓部署实现把这个只导出模型的Backbone + Neck的YOLOv8-Pose NCNN代码进行简单改写就可以实现

环境

我使用的ultralytics版本如下:

sh 复制代码
pip install ultralytics==8.2.98

安装好ultralytics后,后面的操纵需要更改源码,所以需要知道ultralytics安装的路径,可以使用下面的方式进行查询

sh 复制代码
>>> import ultralytics
>>> ultralytics.__version__
'8.2.98'
>>> ultralytics.__path__
pathto\ultralytics

修改

模型导出时的网络修改

修改pathto\ultralytics\nn\modules\head.py文件中的POSE类,在其forward函数中加如下代码

python 复制代码
if self.export or torch.onnx.is_in_onnx_export():
    results = self.forward_pose_export(x)
    return tuple(results)

同时在POSE类加上新如下函数

python 复制代码
def forward_pose_export(self, x):
    results = []
    for i in range(self.nl):
        dfl = self.cv2[i](x[i]).permute(0, 2, 3, 1).contiguous()
        cls = self.cv3[i](x[i]).permute(0, 2, 3, 1).contiguous()
        kpt = self.cv4[i](x[i]).permute(0, 2, 3, 1).contiguous()
        results.append(torch.cat((cls, dfl, kpt), -1))
    return results

修改后的整体代码效果如下:

python 复制代码
class Pose(Detect):
    """YOLOv8 Pose head for keypoints models."""

    def __init__(self, nc=80, kpt_shape=(17, 3), ch=()):
        """Initialize YOLO network with default parameters and Convolutional Layers."""
        super().__init__(nc, ch)
        self.kpt_shape = kpt_shape  # number of keypoints, number of dims (2 for x,y or 3 for x,y,visible)
        self.nk = kpt_shape[0] * kpt_shape[1]  # number of keypoints total

        c4 = max(ch[0] // 4, self.nk)
        self.cv4 = nn.ModuleList(nn.Sequential(Conv(x, c4, 3), Conv(c4, c4, 3), nn.Conv2d(c4, self.nk, 1)) for x in ch)

    def forward(self, x):
        """Perform forward pass through YOLO model and return predictions."""
        if self.export or torch.onnx.is_in_onnx_export():
            results = self.forward_pose_export(x)
            return tuple(results)

        bs = x[0].shape[0]  # batch size
        kpt = torch.cat([self.cv4[i](x[i]).view(bs, self.nk, -1) for i in range(self.nl)], -1)  # (bs, 17*3, h*w)
        x = Detect.forward(self, x)
        if self.training:
            return x, kpt
        pred_kpt = self.kpts_decode(bs, kpt)
        return torch.cat([x, pred_kpt], 1) if self.export else (torch.cat([x[0], pred_kpt], 1), (x[1], kpt))

    def forward_pose_export(self, x):
        results = []
        for i in range(self.nl):
            dfl = self.cv2[i](x[i]).permute(0, 2, 3, 1).contiguous()
            cls = self.cv3[i](x[i]).permute(0, 2, 3, 1).contiguous()
            kpt = self.cv4[i](x[i]).permute(0, 2, 3, 1).contiguous()
            results.append(torch.cat((cls, dfl, kpt), -1))
        return results

这种代码修改的导出方式并不影响训练的过程,仅在导出的时候会起到效果

更换激活函数并重头开始训练

YOLOv8默认使用的激活函数是SiLU,为了能更快的提升速度这里我换成了计算更高效的ReLU,并且更换激活函数后在转化成NCNN模型进行优化的时候会将卷积和ReLU融合之后,推理速度进一步得到了一些提升

更换激活函数后,需要对原有的Pytorch模型进行重新训练再导出ONNX

修改pathto\ultralytics\nn\modules\conv.py中的第39行左右的default_act = nn.SiLU() 修改为 default_act = nn.ReLU()

修改完成后使用训练脚本进行训练

python 复制代码
from ultralytics import YOLO

model = YOLO('yolov8n-pose.yaml').load('yolov8n-pose.pt')  

results = model.train(data='coco-pose.yaml', epochs=100, imgsz=640, workers=4, batch=64, project='Pose_runs', name='pose_n_relu')

**注意: **

1、 .load('yolov8n-pose.pt')时加载预训练模型,虽然预训练的激活函数不一样,但是我测试了以下发现加上预训练模型前面的epoch的结果损失很

2、上面的数据使用的是coco2017数据集,训练环境为4090显卡,训练10个epoch接近一个小时

3、coco-pose.yaml、yolov8n-pose.yaml均是ultralytics默认配置文件

导出的ONNX名字修改

如果需要修改输出的名称则要去修改 pathto\ultralytics\engine\exporter.py 中的 export_onnx函数

ONNX导出

编写一个export.py的导出onnx的python代码

python 复制代码
from ultralytics import YOLO

# load a pretrained model
model = YOLO('pathto\best.pt') # 训练得到的权重

# export onnx
model.export(format='onnx', opset=11, simplify=True, dynamic=False, imgsz=640)

NCNN转化和优化

下面是onnx转化为NCNN的代码和对NCNN模型进行fp16的优化

sh 复制代码
./onnx2ncnn yolov8n-pose.onnx yolov8n-pose-sim.param yolov8n-pose-sim.bin
./ncnnoptimize yolov8n-pose-sim.param  yolov8n-pose-sim.bin yolov8n-pose-opt-fp16.param yolov8n-pose-opt-fp16.bin 1

在使用ncnnoptimize对relu激活函数的onnx模型进行优化的时候,你会发现relu和卷积会进行算子融合

sh 复制代码
(base) gx@RUKN0DC:/mnt/e/ubuntu_20.04/ncnn/build/install/bin$ ./ncnnoptimize yolov8pose-relu.param yolov8pose-relu.bin yolov8pose-relu-opt.param yolov8pose-relu-opt.bin 1
fuse_convolution_activation /model.0/conv/Conv /model.0/act/Relu
fuse_convolution_activation /model.1/conv/Conv /model.1/act/Relu
fuse_convolution_activation /model.2/cv1/conv/Conv /model.2/cv1/act/Relu
fuse_convolution_activation /model.2/m.0/cv1/conv/Conv /model.2/m.0/cv1/act/Relu
fuse_convolution_activation /model.2/m.0/cv2/conv/Conv /model.2/m.0/cv2/act/Relu
fuse_convolution_activation /model.2/cv2/conv/Conv /model.2/cv2/act/Relu
fuse_convolution_activation /model.3/conv/Conv /model.3/act/Relu
fuse_convolution_activation /model.4/cv1/conv/Conv /model.4/cv1/act/Relu
fuse_convolution_activation /model.4/m.0/cv1/conv/Conv /model.4/m.0/cv1/act/Relu
fuse_convolution_activation /model.4/m.0/cv2/conv/Conv /model.4/m.0/cv2/act/Relu
fuse_convolution_activation /model.4/m.1/cv1/conv/Conv /model.4/m.1/cv1/act/Relu
fuse_convolution_activation /model.4/m.1/cv2/conv/Conv /model.4/m.1/cv2/act/Relu
fuse_convolution_activation /model.4/cv2/conv/Conv /model.4/cv2/act/Relu
fuse_convolution_activation /model.5/conv/Conv /model.5/act/Relu
fuse_convolution_activation /model.6/cv1/conv/Conv /model.6/cv1/act/Relu
fuse_convolution_activation /model.6/m.0/cv1/conv/Conv /model.6/m.0/cv1/act/Relu
fuse_convolution_activation /model.6/m.0/cv2/conv/Conv /model.6/m.0/cv2/act/Relu
fuse_convolution_activation /model.6/m.1/cv1/conv/Conv /model.6/m.1/cv1/act/Relu
fuse_convolution_activation /model.6/m.1/cv2/conv/Conv /model.6/m.1/cv2/act/Relu
fuse_convolution_activation /model.6/cv2/conv/Conv /model.6/cv2/act/Relu
fuse_convolution_activation /model.7/conv/Conv /model.7/act/Relu
fuse_convolution_activation /model.8/cv1/conv/Conv /model.8/cv1/act/Relu
fuse_convolution_activation /model.8/m.0/cv1/conv/Conv /model.8/m.0/cv1/act/Relu
fuse_convolution_activation /model.8/m.0/cv2/conv/Conv /model.8/m.0/cv2/act/Relu
fuse_convolution_activation /model.8/cv2/conv/Conv /model.8/cv2/act/Relu
fuse_convolution_activation /model.9/cv1/conv/Conv /model.9/cv1/act/Relu
fuse_convolution_activation /model.9/cv2/conv/Conv /model.9/cv2/act/Relu
fuse_convolution_activation /model.12/cv1/conv/Conv /model.12/cv1/act/Relu
fuse_convolution_activation /model.12/m.0/cv1/conv/Conv /model.12/m.0/cv1/act/Relu
fuse_convolution_activation /model.12/m.0/cv2/conv/Conv /model.12/m.0/cv2/act/Relu
fuse_convolution_activation /model.12/cv2/conv/Conv /model.12/cv2/act/Relu
fuse_convolution_activation /model.15/cv1/conv/Conv /model.15/cv1/act/Relu
fuse_convolution_activation /model.15/m.0/cv1/conv/Conv /model.15/m.0/cv1/act/Relu
fuse_convolution_activation /model.15/m.0/cv2/conv/Conv /model.15/m.0/cv2/act/Relu
fuse_convolution_activation /model.15/cv2/conv/Conv /model.15/cv2/act/Relu
fuse_convolution_activation /model.16/conv/Conv /model.16/act/Relu
fuse_convolution_activation /model.18/cv1/conv/Conv /model.18/cv1/act/Relu
fuse_convolution_activation /model.18/m.0/cv1/conv/Conv /model.18/m.0/cv1/act/Relu
fuse_convolution_activation /model.18/m.0/cv2/conv/Conv /model.18/m.0/cv2/act/Relu
fuse_convolution_activation /model.18/cv2/conv/Conv /model.18/cv2/act/Relu
fuse_convolution_activation /model.19/conv/Conv /model.19/act/Relu
fuse_convolution_activation /model.21/cv1/conv/Conv /model.21/cv1/act/Relu
fuse_convolution_activation /model.21/m.0/cv1/conv/Conv /model.21/m.0/cv1/act/Relu
fuse_convolution_activation /model.21/m.0/cv2/conv/Conv /model.21/m.0/cv2/act/Relu
fuse_convolution_activation /model.21/cv2/conv/Conv /model.21/cv2/act/Relu
fuse_convolution_activation /model.22/cv2.0/cv2.0.0/conv/Conv /model.22/cv2.0/cv2.0.0/act/Relu
fuse_convolution_activation /model.22/cv2.0/cv2.0.1/conv/Conv /model.22/cv2.0/cv2.0.1/act/Relu
fuse_convolution_activation /model.22/cv3.0/cv3.0.0/conv/Conv /model.22/cv3.0/cv3.0.0/act/Relu
fuse_convolution_activation /model.22/cv3.0/cv3.0.1/conv/Conv /model.22/cv3.0/cv3.0.1/act/Relu
fuse_convolution_activation /model.22/cv4.0/cv4.0.0/conv/Conv /model.22/cv4.0/cv4.0.0/act/Relu
fuse_convolution_activation /model.22/cv4.0/cv4.0.1/conv/Conv /model.22/cv4.0/cv4.0.1/act/Relu
fuse_convolution_activation /model.22/cv2.1/cv2.1.0/conv/Conv /model.22/cv2.1/cv2.1.0/act/Relu
fuse_convolution_activation /model.22/cv2.1/cv2.1.1/conv/Conv /model.22/cv2.1/cv2.1.1/act/Relu
fuse_convolution_activation /model.22/cv3.1/cv3.1.0/conv/Conv /model.22/cv3.1/cv3.1.0/act/Relu
fuse_convolution_activation /model.22/cv3.1/cv3.1.1/conv/Conv /model.22/cv3.1/cv3.1.1/act/Relu
fuse_convolution_activation /model.22/cv4.1/cv4.1.0/conv/Conv /model.22/cv4.1/cv4.1.0/act/Relu
fuse_convolution_activation /model.22/cv4.1/cv4.1.1/conv/Conv /model.22/cv4.1/cv4.1.1/act/Relu
fuse_convolution_activation /model.22/cv2.2/cv2.2.0/conv/Conv /model.22/cv2.2/cv2.2.0/act/Relu
fuse_convolution_activation /model.22/cv2.2/cv2.2.1/conv/Conv /model.22/cv2.2/cv2.2.1/act/Relu
fuse_convolution_activation /model.22/cv3.2/cv3.2.0/conv/Conv /model.22/cv3.2/cv3.2.0/act/Relu
fuse_convolution_activation /model.22/cv3.2/cv3.2.1/conv/Conv /model.22/cv3.2/cv3.2.1/act/Relu
fuse_convolution_activation /model.22/cv4.2/cv4.2.0/conv/Conv /model.22/cv4.2/cv4.2.0/act/Relu
fuse_convolution_activation /model.22/cv4.2/cv4.2.1/conv/Conv /model.22/cv4.2/cv4.2.1/act/Relu
Input layer images without shape info, shape_inference skipped
Input layer images without shape info, estimate_memory_footprint skipped

下面是ONNX、NCNN模型以及NCNN优化后的三个模型结构简单对比

下面是不同的激活函数最后NCNN优化后的网络结构简单对比

安卓代码的修改

参考这两个代码进行修改

https://github.com/eecn/ncnn-android-yolov8-pose

https://github.com/Rachel-liuqr/yolov8s-pose-ncnn

有以下几个修改的地方:

  1. 将sigmoid函数修改为了使用快速指数fast_exp的sigmoid
  2. 将 cv::dnn::NMSBoxes 修改了使用纯C++代码的实现

具体的代码过程,有兴趣的可以去查看

本人水平高,代码应该还有更大的优化空间!!

参考资料

https://github.com/eecn/ncnn-android-yolov8-pose

https://github.com/ultralytics/ultralytics/tree/a007668e1fa8d5d586e6daa3924d65cfb139b8ac/examples/YOLOv8-NCNN-Python-Det-Pose-Cls-Seg-Obb

https://blog.csdn.net/Rachel321/article/details/130381788

https://github.com/Rachel-liuqr/yolov8s-pose-ncnn

https://zhuanlan.zhihu.com/p/622596922

相关推荐
西瓜本瓜@2 小时前
在Android中如何使用Protobuf上传协议
android·java·开发语言·git·学习·android-studio
似霰5 小时前
安卓adb shell串口基础指令
android·adb
fatiaozhang95277 小时前
中兴云电脑W102D_晶晨S905X2_2+16G_mt7661无线_安卓9.0_线刷固件包
android·adb·电视盒子·魔百盒刷机·魔百盒固件
CYRUS_STUDIO8 小时前
Android APP 热修复原理
android·app·hotfix
鸿蒙布道师9 小时前
鸿蒙NEXT开发通知工具类(ArkTs)
android·ios·华为·harmonyos·arkts·鸿蒙系统·huawei
鸿蒙布道师9 小时前
鸿蒙NEXT开发网络相关工具类(ArkTs)
android·ios·华为·harmonyos·arkts·鸿蒙系统·huawei
大耳猫9 小时前
【解决】Android Gradle Sync 报错 Could not read workspace metadata
android·gradle·android studio
ta叫我小白9 小时前
实现 Android 图片信息获取和 EXIF 坐标解析
android·exif·经纬度
dpxiaolong10 小时前
RK3588平台用v4l工具调试USB摄像头实践(亮度,饱和度,对比度,色相等)
android·windows
tangweiguo0305198711 小时前
Android 混合开发实战:统一 View 与 Compose 的浅色/深色主题方案
android