实例讲解昇腾 CANN YOLOV8 和 YOLOV9 适配

本文分享自华为云社区《昇腾 CANN YOLOV8 和 YOLOV9 适配》,作者:jackwangcumt。

1 概述

华为昇腾 CANN YOLOV8 推理示例 C++样例 , 是基于Ascend CANN Samples官方示例中的sampleYOLOV7进行的YOLOV8适配。一般来说,YOLOV7模型输出的数据大小为[1,25200,85],而YOLOV8模型输出的数据大小为[1,84,8400],因此,需要对sampleYOLOV7中的后处理部分进行修改,从而做到YOLOV8/YOLOV9模型的适配。因项目研发需要,公司购置了一台 Atlas 500 Pro 智能边缘服务器, 安装的操作系统为Ubuntu 20.04 LTS Server,并按照官方说明文档,安装的Ascend-cann-toolkit_7.0.RC1_linux-aarch64.run等软件。具体可以参考另外一篇博文【Atlas 500 Pro 智能边缘服务器推理环境搭建】,这里不再赘述。

2 YOLOV8模型准备

在进行YOLOV8模型适配工作之前,首先需要获取YOLOV8的模型文件,这里以官方的 YOLOV8n.pt模型为例,在Windows操作系统上可以安装YOLOV8环境,并执行如下python脚本(pth2onnx.py)将.pt模型转化成.onnx模型:

复制代码
import argparse
from ultralytics import YOLO

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--pt', default="yolov8n", help='.pt file')
    args = parser.parse_args()
    model = YOLO(args.pt)
    onnx_model = model.export(format="onnx", dynamic=False, simplify=True, opset=11)

if __name__ == '__main__':
    main()

具体的YOLOV8环境搭建步骤,可以参考 https://github.com/ultralytics/ultralytics 网站。当成功执行后,会生成yolov8n.onnx模型。输出内容示例如下所示:

复制代码
(base) I:\yolov8\Yolov8_for_PyTorch>python pth2onnx.py --pt=yolov8n.pt
Ultralytics YOLOv8.0.229 🚀 Python-3.11.5 torch-2.1.2 CPU (Intel Core(TM) i7-10700K 3.80GHz)
YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs

PyTorch: starting from 'yolov8n.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (6.2 MB)

ONNX: starting export with onnx 1.15.0 opset 11...
ONNX: simplifying with onnxsim 0.4.36...
ONNX: export success ✅ 1.0s, saved as 'yolov8n.onnx' (12.2 MB)

Export complete (3.2s)
Results saved to I:\yolov8\Yolov8_for_PyTorch
Predict:         yolo predict task=detect model=yolov8n.onnx imgsz=640
Validate:        yolo val task=detect model=yolov8n.onnx imgsz=640 data=coco.yaml
Visualize:       https://netron.app

从输出信息中可以看出, yolov8n.pt原始模型的输出尺寸为 (1, 3, 640, 640),格式为 BCHW ,输出尺寸为 (1, 84, 8400) 。这个模型的更多信息,可以用 netron 工具进行可视化查看,在安装了netron后,可以执行如下命令打开yolov8n.onnx模型进行Web网络结构的查看:

复制代码
(base) I:\yolov8\Yolov8_for_PyTorch>netron yolov8n.onnx
Serving 'yolov8x.onnx' at http://localhost:8080

可以看到,转化后的yolov8n.onnx模型输入的节点名称为images,输入张量的大小为[1,3,640,640] 。在将yolov8n.onnx模型上传到Atlas 500 Pro服务器上,执行如下命令进行模型转换:

复制代码
atc --model=yolov8n.onnx --framework=5 
--output=yolov8n 
--input_shape="images:1,3,640,640"  
--soc_version=Ascend310P3 
--insert_op_conf=aipp.cfg

其中的:

--soc_version=Ascend310P3可以通过npu-smi info命令进行查看,我这里打印的是 310P3 则,--soc_version 为 Ascend前缀加上310P3,即Ascend310P3。

--input_shape="images:1,3,640,640" 表示NCHW,即批处理为1,通道为3,图片大小为640x640,这与onnx模型的输入节点一致 。

--insert_op_conf=aipp.cfg 中的aipp.cfg来自官网sampleYOLOV7示例。由于原始输入图片的大小可能不符合要求,需要缩放到640x640的尺寸。aipp.cfg内容如下:

复制代码
 aipp_op{
    aipp_mode:static
    input_format : YUV420SP_U8
    src_image_size_w : 640
    src_image_size_h : 640

    csc_switch : true
    rbuv_swap_switch : false
    matrix_r0c0 : 256
    matrix_r0c1 : 0
    matrix_r0c2 : 359
    matrix_r1c0 : 256
    matrix_r1c1 : -88
    matrix_r1c2 : -183
    matrix_r2c0 : 256
    matrix_r2c1 : 454
    matrix_r2c2 : 0
    input_bias_0 : 0
    input_bias_1 : 128
    input_bias_2 : 128

    crop: true
    load_start_pos_h : 0
    load_start_pos_w : 0
    crop_size_w : 640
    crop_size_h : 640

    min_chn_0 : 0
    min_chn_1 : 0
    min_chn_2 : 0
    var_reci_chn_0: 0.0039215686274509803921568627451
    var_reci_chn_1: 0.0039215686274509803921568627451
    var_reci_chn_2: 0.0039215686274509803921568627451
}

生执行成功后,会生成 yolov8n.om 离线模型

3 适配代码

根据官网sampleYOLOV7示例适配的YOLOV8示例,代码已经开源,地址为:https://gitee.com/cumt/ascend-yolov8-sample。核心代码sampleYOLOV8.cpp中的后处理方法GetResult为:

复制代码
Result SampleYOLOV8::GetResult(std::vector<InferenceOutput> &inferOutputs,
                               string imagePath, size_t imageIndex, bool release)
{
    uint32_t outputDataBufId = 0;
    float *classBuff = static_cast<float *>(inferOutputs[outputDataBufId].data.get());
    // confidence threshold
    float confidenceThreshold = 0.35;

    // class number
    size_t classNum = 80;

    //// number of (x, y, width, hight)
    size_t offset = 4;

    // total number of boxs yolov8 [1,84,8400]
    size_t modelOutputBoxNum = 8400; 

    // read source image from file
    cv::Mat srcImage = cv::imread(imagePath);
    int srcWidth = srcImage.cols;
    int srcHeight = srcImage.rows;

    // filter boxes by confidence threshold
    vector<BoundBox> boxes;
    size_t yIndex = 1;
    size_t widthIndex = 2;
    size_t heightIndex = 3;

    // size_t all_num = 1 * 84 * 8400 ; // 705,600

    for (size_t i = 0; i < modelOutputBoxNum; ++i)
    {

        float maxValue = 0;
        size_t maxIndex = 0;
        for (size_t j = 0; j < classNum; ++j)
        {

            float value = classBuff[(offset + j) * modelOutputBoxNum + i];
            if (value > maxValue)
            {
                // index of class
                maxIndex = j;
                maxValue = value;
            }
        }

        if (maxValue > confidenceThreshold)
        {
            BoundBox box;
            box.x = classBuff[i] * srcWidth / modelWidth_;
            box.y = classBuff[yIndex * modelOutputBoxNum + i] * srcHeight / modelHeight_;
            box.width = classBuff[widthIndex * modelOutputBoxNum + i] * srcWidth / modelWidth_;
            box.height = classBuff[heightIndex * modelOutputBoxNum + i] * srcHeight / modelHeight_;
            box.score = maxValue;
            box.classIndex = maxIndex;
            box.index = i;
            if (maxIndex < classNum)
            {
                boxes.push_back(box);
            }
        }
    }

    ACLLITE_LOG_INFO("filter boxes by confidence threshold > %f success, boxes size is %ld", confidenceThreshold,boxes.size());

    // filter boxes by NMS
    vector<BoundBox> result;
    result.clear();
    float NMSThreshold = 0.45;
    int32_t maxLength = modelWidth_ > modelHeight_ ? modelWidth_ : modelHeight_;
    std::sort(boxes.begin(), boxes.end(), sortScore);
    BoundBox boxMax;
    BoundBox boxCompare;
    while (boxes.size() != 0)
    {
        size_t index = 1;
        result.push_back(boxes[0]);
        while (boxes.size() > index)
        {
            boxMax.score = boxes[0].score;
            boxMax.classIndex = boxes[0].classIndex;
            boxMax.index = boxes[0].index;

            // translate point by maxLength * boxes[0].classIndex to
            // avoid bumping into two boxes of different classes
            boxMax.x = boxes[0].x + maxLength * boxes[0].classIndex;
            boxMax.y = boxes[0].y + maxLength * boxes[0].classIndex;
            boxMax.width = boxes[0].width;
            boxMax.height = boxes[0].height;

            boxCompare.score = boxes[index].score;
            boxCompare.classIndex = boxes[index].classIndex;
            boxCompare.index = boxes[index].index;

            // translate point by maxLength * boxes[0].classIndex to
            // avoid bumping into two boxes of different classes
            boxCompare.x = boxes[index].x + boxes[index].classIndex * maxLength;
            boxCompare.y = boxes[index].y + boxes[index].classIndex * maxLength;
            boxCompare.width = boxes[index].width;
            boxCompare.height = boxes[index].height;

            // the overlapping part of the two boxes
            float xLeft = max(boxMax.x, boxCompare.x);
            float yTop = max(boxMax.y, boxCompare.y);
            float xRight = min(boxMax.x + boxMax.width, boxCompare.x + boxCompare.width);
            float yBottom = min(boxMax.y + boxMax.height, boxCompare.y + boxCompare.height);
            float width = max(0.0f, xRight - xLeft);
            float hight = max(0.0f, yBottom - yTop);
            float area = width * hight;
            float iou = area / (boxMax.width * boxMax.height + boxCompare.width * boxCompare.height - area);

            // filter boxes by NMS threshold
            if (iou > NMSThreshold)
            {
                boxes.erase(boxes.begin() + index);
                continue;
            }
            ++index;
        }
        boxes.erase(boxes.begin());
    }

    ACLLITE_LOG_INFO("filter boxes by NMS threshold > %f success, result size is %ld", NMSThreshold,result.size());
 
    // opencv draw label params
    const double fountScale = 0.5;
    const uint32_t lineSolid = 2;
    const uint32_t labelOffset = 11;
    const cv::Scalar fountColor(0, 0, 255); // BGR
    const vector<cv::Scalar> colors{
        cv::Scalar(255, 0, 0), cv::Scalar(0, 255, 0),
        cv::Scalar(0, 0, 255)};

    int half = 2;
    for (size_t i = 0; i < result.size(); ++i)
    {
        cv::Point leftUpPoint, rightBottomPoint;
        leftUpPoint.x = result[i].x - result[i].width / half;
        leftUpPoint.y = result[i].y - result[i].height / half;
        rightBottomPoint.x = result[i].x + result[i].width / half;
        rightBottomPoint.y = result[i].y + result[i].height / half;
        cv::rectangle(srcImage, leftUpPoint, rightBottomPoint, colors[i % colors.size()], lineSolid);
        string className = label[result[i].classIndex];
        string markString = to_string(result[i].score) + ":" + className;

        ACLLITE_LOG_INFO("object detect [%s] success", markString.c_str());

        cv::putText(srcImage, markString, cv::Point(leftUpPoint.x, leftUpPoint.y + labelOffset),
                    cv::FONT_HERSHEY_COMPLEX, fountScale, fountColor);
    }
    string savePath = "out_" + to_string(imageIndex) + ".jpg";
    cv::imwrite(savePath, srcImage);
    if (release)
    {
        free(classBuff);
        classBuff = nullptr;
    }
    return SUCCESS;
}

YOLOV8的输出尺寸为 (1, 84, 8400),其中的8400代表模型原始预测的对象检测框信息,即代码中用 size_t modelOutputBoxNum = 8400 ; 进行表示。而 84 代表 4个位的边界框预测值(x,y,w,h)位置信息和80个检测类别数,即 84 = 4 + 80 。由于模型检测结果是用内存连续的一维数组进行表示的,因此,需要根据yolov8输出尺寸的实际含义,来访问需要的数组内存地址来获取需要的值。根据资料显示,yolov8模型不另外对置信度进行预测, 而是采用类别里面最大的概率作为置信度的值,8400是yolov8模型各尺度输出特征图叠加之后的结果,一般推理不需要处理。下面给出模型尺寸和内存数组的映射示意图 :

即除首行外,将其他83行的每一行依次变换到首行的末尾构成一维数组,一维数组的大小位 8400 x 84 。遍历数组时,首先将8400个预测信息中的置信度获取到,即偏移offset=4个后,获取80个类别位中最大的值以及索引转化为置信度和类别ID。前4个代表x,y,w,预测框信息。对于个性化定制的模型,则需要修改 size_t classNum = 80; 即可,参考onnx输出尺寸[1,84,8400]中的84-4 = 80 , 比如自定义的模型输出为[1,26,8400],则 size_t classNum = 22 (26-4).

4 编译运行

下载开源代码,上传服务器,并解压,然后执行如下命令进行代码编译:

复制代码
unzip ascend-yolov8-sample-master.zip -d ./kztech 
cd ascend-yolov8-sample-master/src
# src目录下
cmake .
make
#如果正确执行,则会在../out目录中生成 main 可执行文件,在src目录中运行示例
../out/main
#如果报如下错误:
../out/main: error while loading shared libraries: libswresample.so.3: 
cannot open shared object file: No such file or directory
则尝试设置如下环境变量后重试:
export LD_LIBRARY_PATH=/usr/local/Ascend/thirdpart/aarch64/lib:$LD_LIBRARY_PATH
#正确执行后,会在当前目录中生成out_0.jpg文件

执行成功,控制台打印如下信息:

复制代码
root@atlas500ai:/home/kztech/ascend-yolov8-sample-master/src# ../out/main
[INFO]  Acl init ok
[INFO]  Open device 0 ok
[INFO]  Use default context currently
[INFO]  dvpp init resource ok
[INFO]  Load model ../model/yolov8n.om success
[INFO]  Create model description success
[INFO]  Create model(../model/yolov8n.om) output success
[INFO]  Init model ../model/yolov8n.om success
[INFO]  filter boxes by confidence threshold > 0.350000 success, boxes size is 10
[INFO]  filter boxes by NMS threshold > 0.450000 success, result size is 1
[INFO]  object detect [0.878906:dog] success
[INFO]  Inference elapsed time : 0.038817 s , fps is 25.761685
[INFO]  Unload model ../model/yolov8n.om success
[INFO]  destroy context ok
[INFO]  Reset device 0 ok
[INFO]  Finalize acl ok

5 总结

YOLO各系列的适配过程,大部分都是处理输入格式和输出格式的变换上,参考YOLOV7,可以进行YOLOV8模型的适配,同理,YOLOV9的模型适配也是一样的。目前YOLOV9和YOLOV8模型输出格式一致,因此,只需要进行yolov9xx.om模型的生成工作即可。yolov9-c-converted.pt模型(https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-c-converted.pt)转换如下: 在windows操作系统上可以安装YOLOV9环境,并执行如下python脚本将.pt模型转化成.onnx模型:

复制代码
#从base环境创建新的环境yolov9
conda create -n yolov9 --clone base
#激活虚拟环境yolov9
conda activate yolov9
#克隆yolov9代码
git clone https://github.com/WongKinYiu/yolov9
# 安装yolov9项目的依赖
(yolov9) I:\yolov9-main>pip install -r requirements.txt
# 模型转换导出onnx
(yolov9) I:\yolov9-main>python export.py --weights yolov9-c-converted.pt --include onnx
复制代码
(yolov9) I:\yolov9-main>python export.py --weights yolov9-c-converted.pt --include onnx
export: data=I:\yolov9-main\data\coco.yaml, weights=['yolov9-c-converted.pt'], imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, 
keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['onnx']
YOLO  2024-3-13 Python-3.11.5 torch-2.1.2 CPU

Fusing layers...
gelan-c summary: 387 layers, 25288768 parameters, 64944 gradients, 102.1 GFLOPs

PyTorch: starting from yolov9-c-converted.pt with output shape (1, 84, 8400) (49.1 MB)

ONNX: starting export with onnx 1.15.0...
ONNX: export success  2.9s, saved as yolov9-c-converted.onnx (96.8 MB)

Export complete (4.4s)
Results saved to I:\yolov9-main
Detect:          python detect.py --weights yolov9-c-converted.onnx
Validate:        python val.py --weights yolov9-c-converted.onnx
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov9-c-converted.onnx')
Visualize:       https://netron.app
复制代码
atc --model=yolov9-c-converted.onnx --framework=5 
--output=yolov9-c-converted 
--input_shape="images:1,3,640,640"  
--soc_version=Ascend310P3 
--insert_op_conf=aipp.cfg

点击关注,第一时间了解华为云新鲜技术~

相关推荐
华为云开发者联盟2 个月前
最佳实践:解读GaussDB(DWS) 统计信息自动收集方案
大数据·华为云开发者联盟·gaussdb(dws)·gaussdb(dws)·实时查询·统计信息
华为云开发者联盟2 个月前
深度解读KubeEdge架构设计与边缘AI实践探索
ai·边缘计算·kubeedge·华为云开发者联盟·sedna
华为云开发者联盟2 个月前
仓颉编程语言技术指南:嵌套函数、Lambda 表达式、闭包
鸿蒙·编程语言·华为云开发者联盟·仓颉
华为云开发者联盟2 个月前
深度解读GaussDB(for MySQL)与MySQL的COUNT查询并行优化策略
mysql·华为云开发者联盟
华为云开发者联盟2 个月前
Kmesh v0.4发布!迈向大规模 Sidecarless 服务网格
容器·华为云开发者联盟
华为云开发者联盟3 个月前
解读GaussDB(for MySQL)灵活多维的二级分区表策略
mysql·华为云开发者联盟
华为云开发者联盟3 个月前
从基础到高级应用,详解用Python实现容器化和微服务架构
python·docker·微服务·容器·华为云开发者联盟
华为云开发者联盟3 个月前
基于MindSpore实现BERT对话情绪识别
昇腾·华为云开发者联盟
华为云开发者联盟3 个月前
解读MySQL 8.0数据字典缓存管理机制
mysql·缓存·数据字典·元数据·华为云开发者联盟
华为云开发者联盟3 个月前
深度解读昇腾CANN模型下沉技术,提升模型调度性能
大模型·昇腾·cann·华为云开发者联盟