yolo11s rknn无法detect的bugfix - step by step

1.缘起

上周四下班时，发现在宿主机环境工作良好的既有的pytorch模型，在通过.pt->.onnx->.rknn的转换后无法正常工作。周五下班时，怀疑疑点在两处：

版本匹配问题
通道和参数传递问题。

周六，周日，周末时间，我将各个环境的pytorch版本修改为渐趋一致，并且参照了rknn2.3中的版本要求，对齐到同一个版本，并且怀疑training部分的版本与pt->rknn的环境不一致，我把两个环境融合到一个docker了。

training env:

torch 2.4.0+cpu

torchaudio 2.4.0+cpu

torchvision 0.19.0+cpu

ultralytics 8.3.68 /ultralytics

ultralytics-thop 2.0.14

rknn env:

torch 2.4.0+cpu

torchaudio 2.4.0+cpu

torchvision 0.19.0+cpu

在周日最后一次detect测试时，结果仍然是目标对象无法检出。下面针对这个问题，开展分析，尝试解决。

2.尝试1：增加模型精度yolov11n->yolov11s

周日最后，我启动了针对yolov11s.pt的训练，训练的数据集是一个测试数据集moonpie.这一次，我把pt->onnx->rknn的docker处理成了唯一的一个,epoch增大到250（batch=16, imgsz=640)

model=YOLO('yolo11.yaml').load('yolo11s.pt')

result = model.train(data=r'./moonpie.yaml', epochs=250, batch=16, imgsz=640, device='cpu')

最终的训练结果：

results_dict: {'metrics/precision(B)': 0.8658830071855359, 'metrics/recall(B)': 0.770949720670391, 'metrics/mAP50(B)': 0.8821807607242769, 'metrics/mAP50-95(B)': 0.6566184427590052, 'fitness': 0.6791746745555324}

save_dir: PosixPath('/app/rk3588_build/yolo_sdk/ultralytics/runs/detect/train4')

speed: {'preprocess': 0.9470678144885648, 'inference': 83.09212807686097, 'loss': 0.00011536382859753024, 'postprocess': 1.3212234743179814}

task: 'detect'

突然发现一件事，因为我周四开始的测试，是把新物体放到了第81个slot返回。难道是class_id detected的时候忘记处理它的大小了？要是的话，这个错误就太低级了。

2.1 直接做模拟环境的最终测试

step1. pt2onnx

python 复制代码

from ultralytics import YOLO

# Load a model
model = YOLO("/app/rk3588_build/last_moonpie_yolov11s.pt")  # load an official model
#model = YOLO(r"./best.pt")  # load a custom trained model
# Export the model
model.export(format="onnx")

Ultralytics 8.3.68 🚀 Python-3.8.10 torch-2.4.0+cpu CPU (unknown)

YOLO11 summary (fused): 238 layers, 2,617,701 parameters, 0 gradients, 6.5 GFLOPs

PyTorch: starting from '/app/rk3588_build/last_moonpie_yolov11s.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 85, 8400) (5.3 MB)

ONNX: starting export with onnx 1.17.0 opset 19...

ONNX: slimming with onnxslim 0.1.48...

ONNX: export success ✅ 0.6s, saved as '/app/rk3588_build/last_moonpie_yolov11s.onnx' (10.2 MB)

Export complete (0.9s)

Results saved to /app/rk3588_build

Predict: yolo predict task=detect model=/app/rk3588_build/last_moonpie_yolov11s.onnx imgsz=640

Validate: yolo val task=detect model=/app/rk3588_build/last_moonpie_yolov11s.onnx imgsz=640 data=./moonpie.yaml

Visualize: https://netron.app

step2. test rknn detect:

>>>>>>>>>>>>>>>original model: /app/rk3588_build/last_moonpie_yolov11s.onnx

--> Running model

I GraphPreparing : 100%|███████████████████████████████████████| 238/238 [00:00<00:00, 19028.68it/s]

I SessionPreparing : 100%|██████████████████████████████████████| 238/238 [00:00<00:00, 5188.41it/s]

target pic has no object concerned.

这一次我觉得我得把注意力集中到detect.py的语法上。

3.detect的语法：

python 复制代码

    # Set inputs
    img = cv2.imread(IMG_PATH)
    # img, ratio, (dw, dh) = letterbox(img, new_shape=(IMG_SIZE, IMG_SIZE))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))

    print(f'>>>>>>>>>>>>>>>original model: {MODEL1}')

    # Inference
    print('--> Running model')
    img2 = np.expand_dims(img, 0)
    outputs = rknn.inference(inputs=[img2], data_format=['nhwc'])

这里的BGR2RGB有些可疑，第二个疑点是data_format=nhwc

3.1 回顾训练时的通道设置：

yolo11.yaml发现一个重大疑点：

Parameters

nc: 80 # number of classes

scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'

[depth, width, max_channels]

n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs

s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs

m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs

l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs

x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs

YOLO11n backbone

backbone:

[from, repeats, module, args]

$-1, 1, Conv, \[64, 3, 2\]\] # 0-P1/2$

nc的配置始终是80！然后里面只有Yolo11n的配置。再查。

Detect - Ultralytics YOLO Docs

在Yolo转换到onnx时，额外的参数如下：

3.1 Question1 onnx有没有对传入的颜色通道有限制？

这个似乎取决于训练过程中，读取imge的颜色通道，据说：使用 Python 的 OpenCV 库读取图像默认是 BGR 顺序，而 Pillow 库读取图像是 RGB 顺序。

yolov11- patches.py

python 复制代码

# OpenCV Multilanguage-friendly functions ------------------------------------------------------------------------------
_imshow = cv2.imshow  # copy to avoid recursion errors


def imread(filename: str, flags: int = cv2.IMREAD_COLOR):
    """
    Read an image from a file.

    Args:
        filename (str): Path to the file to read.
        flags (int, optional): Flag that can take values of cv2.IMREAD_*. Defaults to cv2.IMREAD_COLOR.

    Returns:
        (np.ndarray): The read image.
    """
    return cv2.imdecode(np.fromfile(filename, np.uint8), flags)

然后，pillow在：

./ultralytics/data/loaders.py: # Load HEIC image using Pillow with pillow-heif

./ultralytics/data/loaders.py: check_requirements("pillow-heif")

./ultralytics/data/loaders.py: from pillow_heif import register_heif_opener

./ultralytics/cfg/datasets/ImageNet.yaml: 721: pillow

./ultralytics/cfg/datasets/ImageNet.yaml: n03938244: pillow

./ultralytics/cfg/datasets/lvis.yaml: 803: pillow

./pyproject.toml: "pillow>=7.1.2",

在loaders.py中有：

python 复制代码

                    register_heif_opener()  # Register HEIF opener with Pillow
                    with Image.open(path) as img:
                        im0 = cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)  # convert image to BGR nparray

然后在yolov8的cv2.imread中有：

python 复制代码

    def preprocess(self):
        """
        Preprocesses the input image before performing inference.

        Returns:
            image_data: Preprocessed image data ready for inference.
        """
        # Read the input image using OpenCV
        self.img = cv2.imread(self.input_image)

        # Get the height and width of the input image
        self.img_height, self.img_width = self.img.shape[:2]

        # Convert the image color space from BGR to RGB
        img = cv2.cvtColor(self.img, cv2.COLOR_BGR2RGB)

        # Resize the image to match the input shape
        img = cv2.resize(img, (self.input_width, self.input_height))

        # Normalize the image data by dividing it by 255.0
        image_data = np.array(img) / 255.0

        # Transpose the image to have the channel dimension as the first dimension
        image_data = np.transpose(image_data, (2, 0, 1))  # Channel first

        # Expand the dimensions of the image data to match the expected input shape
        image_data = np.expand_dims(image_data, axis=0).astype(np.float32)

        # Return the preprocessed image data
        return image_data

结论：基本可以确定：最终的送入onnx的颜色通道是RGB。所以detect.py的颜色通道的处理没有错。

3.2 更详细的解释：

python 复制代码

1. 颜色空间转换
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
这行代码将图像从 BGR（蓝、绿、红）颜色空间转换为 RGB（红、绿、蓝）颜色空间。此时，图像数据是一个三维数组，形状为 (H, W, C)，其中 H 是图像高度，W 是图像宽度，C 是通道数（这里 C = 3，因为是 RGB 图像）。

2. 图像尺寸调整
img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
这行代码将图像调整为指定的大小 (IMG_SIZE, IMG_SIZE)。调整后的图像仍然是三维数组，形状为 (IMG_SIZE, IMG_SIZE, 3)，依旧保持 HWC 的格式。

3. 增加批量维度
img2 = np.expand_dims(img, 0)
np.expand_dims 函数用于在指定的轴上增加一个维度。这里在轴 0 上增加了一个维度，使得原本形状为 (IMG_SIZE, IMG_SIZE, 3) 的三维数组变成了形状为 (1, IMG_SIZE, IMG_SIZE, 3) 的四维数组。此时，新增加的第一个维度表示批量大小（N = 1），所以现在图像数据的格式变为 NHWC。

4. 推理时指定格式
outputs = rknn.inference(inputs=[img2], data_format=['nhwc'])
这行代码调用 rknn 的推理函数，明确指定输入数据 img2 的格式为 NHWC。

综上所述，经过上述一系列操作后，最终输入到 rknn.inference 函数中的数据 img2 是 NHWC 格式。

4.成功的案例

最终参照：

https://blog.csdn.net/zhangqian_1/article/details/142722526https://blog.csdn.net/zhangqian_1/article/details/142722526走通了。原理大概是yolov11，从.onnx的模型输出参数，到inference的输出参数，都与rknn-toolkit2.3版的那个yolo的detect不同。这些代码我需要弄清楚原理，文件里的修改并不完美，也没有附带说明。修改的项点如下：

4.1 最终的侦测识别代码

python 复制代码

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
import argparse
import os
import sys
import os.path as osp
import cv2
import torch
import numpy as np
import onnxruntime as ort
from math import exp

ROOT = os.getcwd()
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))


#ONNX_MODEL = r'/app/rk3588_build/yolo_sdk/ultralytics/yolo11s.onnx'
ONNX_MODEL = r'/app/rk3588_build/yolo11_selfgen.onnx'
#ONNX_MODEL = 'yolov5s_relu.onnx'
#ONNX_MODEL= '/app/rk3588_build/last_moonpie.onnx'
#ONNX_MODEL= '/app/rk3588_build/last_moonpie_yolov11s.onnx'
#ONNX_MODEL= '/app/rk3588_build/best.onnx'
#PYTORCH_MODEL=r"/app/rk3588_build/yolo_sdk/ultralytics/best.pt" #driller model 走不通，版本太严格
RKNN_MODEL = r'/app/rk3588_build/rknn_models/sim_moonpie-640-640_rk3588.rknn'
#IMG_PATH = './frame_2266.png'
DATASET = './dataset.txt'
#IMG_PATH = './bus.jpg'
IMG_PATH = '/app/rk3588_build/cake26.jpg'
QUANTIZE_ON = False

CLASSES = ['moonpie', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
         'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
         'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
         'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
         'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
         'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
         'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
         'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
         'hair drier', 'toothbrush']

meshgrid = []

class_num = len(CLASSES)
headNum = 3
strides = [8, 16, 32]
mapSize = [[80, 80], [40, 40], [20, 20]]
nmsThresh = 0.45
objectThresh = 0.5

input_imgH = 640
input_imgW = 640


class DetectBox:
    def __init__(self, classId, score, xmin, ymin, xmax, ymax):
        self.classId = classId
        self.score = score
        self.xmin = xmin
        self.ymin = ymin
        self.xmax = xmax
        self.ymax = ymax


def GenerateMeshgrid():
    for index in range(headNum):
        for i in range(mapSize[index][0]):
            for j in range(mapSize[index][1]):
                meshgrid.append(j + 0.5)
                meshgrid.append(i + 0.5)


def IOU(xmin1, ymin1, xmax1, ymax1, xmin2, ymin2, xmax2, ymax2):
    xmin = max(xmin1, xmin2)
    ymin = max(ymin1, ymin2)
    xmax = min(xmax1, xmax2)
    ymax = min(ymax1, ymax2)

    innerWidth = xmax - xmin
    innerHeight = ymax - ymin

    innerWidth = innerWidth if innerWidth > 0 else 0
    innerHeight = innerHeight if innerHeight > 0 else 0

    innerArea = innerWidth * innerHeight

    area1 = (xmax1 - xmin1) * (ymax1 - ymin1)
    area2 = (xmax2 - xmin2) * (ymax2 - ymin2)

    total = area1 + area2 - innerArea

    return innerArea / total


def NMS(detectResult):
    predBoxs = []

    sort_detectboxs = sorted(detectResult, key=lambda x: x.score, reverse=True)

    for i in range(len(sort_detectboxs)):
        xmin1 = sort_detectboxs[i].xmin
        ymin1 = sort_detectboxs[i].ymin
        xmax1 = sort_detectboxs[i].xmax
        ymax1 = sort_detectboxs[i].ymax
        classId = sort_detectboxs[i].classId

        if sort_detectboxs[i].classId != -1:
            predBoxs.append(sort_detectboxs[i])
            for j in range(i + 1, len(sort_detectboxs), 1):
                if classId == sort_detectboxs[j].classId:
                    xmin2 = sort_detectboxs[j].xmin
                    ymin2 = sort_detectboxs[j].ymin
                    xmax2 = sort_detectboxs[j].xmax
                    ymax2 = sort_detectboxs[j].ymax
                    iou = IOU(xmin1, ymin1, xmax1, ymax1, xmin2, ymin2, xmax2, ymax2)
                    if iou > nmsThresh:
                        sort_detectboxs[j].classId = -1
    return predBoxs


def sigmoid(x):
    return 1 / (1 + exp(-x))


def postprocess(out, img_h, img_w):
    print('postprocess ... ')

    detectResult = []
    output = []
    for i in range(len(out)):
        print(out[i].shape)
        output.append(out[i].reshape((-1)))

    scale_h = img_h / input_imgH
    scale_w = img_w / input_imgW

    gridIndex = -2
    cls_index = 0
    cls_max = 0

    for index in range(headNum):
        reg = output[index * 2 + 0]
        cls = output[index * 2 + 1]

        for h in range(mapSize[index][0]):
            for w in range(mapSize[index][1]):
                gridIndex += 2

                if 1 == class_num:
                    cls_max = sigmoid(cls[0 * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w])
                    cls_index = 0
                else:
                    for cl in range(class_num):
                        cls_val = cls[cl * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w]
                        if 0 == cl:
                            cls_max = cls_val
                            cls_index = cl
                        else:
                            if cls_val > cls_max:
                                cls_max = cls_val
                                cls_index = cl
                    cls_max = sigmoid(cls_max)

                if cls_max > objectThresh:
                    regdfl = []
                    for lc in range(4):
                        sfsum = 0
                        locval = 0
                        for df in range(16):
                            temp = exp(reg[((lc * 16) + df) * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w])
                            reg[((lc * 16) + df) * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w] = temp
                            sfsum += temp

                        for df in range(16):
                            sfval = reg[((lc * 16) + df) * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w] / sfsum
                            locval += sfval * df
                        regdfl.append(locval)

                    x1 = (meshgrid[gridIndex + 0] - regdfl[0]) * strides[index]
                    y1 = (meshgrid[gridIndex + 1] - regdfl[1]) * strides[index]
                    x2 = (meshgrid[gridIndex + 0] + regdfl[2]) * strides[index]
                    y2 = (meshgrid[gridIndex + 1] + regdfl[3]) * strides[index]

                    xmin = x1 * scale_w
                    ymin = y1 * scale_h
                    xmax = x2 * scale_w
                    ymax = y2 * scale_h

                    xmin = xmin if xmin > 0 else 0
                    ymin = ymin if ymin > 0 else 0
                    xmax = xmax if xmax < img_w else img_w
                    ymax = ymax if ymax < img_h else img_h

                    box = DetectBox(cls_index, cls_max, xmin, ymin, xmax, ymax)
                    detectResult.append(box)
    # NMS
    print('detectResult:', len(detectResult))
    predBox = NMS(detectResult)

    return predBox


def precess_image(img_src, resize_w, resize_h):
    image = cv2.resize(img_src, (resize_w, resize_h), interpolation=cv2.INTER_LINEAR)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = image.astype(np.float32)
    image /= 255.0

    return image


def detect(img_path):

    orig = cv2.imread(img_path)
    img_h, img_w = orig.shape[:2]
    image = precess_image(orig, input_imgW, input_imgH)

    image = image.transpose((2, 0, 1))
    image = np.expand_dims(image, axis=0)

    # image = np.ones((1, 3, 384, 640), dtype=np.float32)
    # print(image.shape)

    ort_session = ort.InferenceSession(ONNX_MODEL)
    pred_results = (ort_session.run(None, {'data': image}))

    out = []
    for i in range(len(pred_results)):
        out.append(pred_results[i])
    predbox = postprocess(out, img_h, img_w)

    print('obj num is :', len(predbox))

    for i in range(len(predbox)):
        xmin = int(predbox[i].xmin)
        ymin = int(predbox[i].ymin)
        xmax = int(predbox[i].xmax)
        ymax = int(predbox[i].ymax)
        classId = predbox[i].classId
        score = predbox[i].score

        cv2.rectangle(orig, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)
        ptext = (xmin, ymin)
        title = CLASSES[classId] + "%.2f" % score
        cv2.putText(orig, title, ptext, cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2, cv2.LINE_AA)

    cv2.imwrite('./test_onnx_result.jpg', orig)


if __name__ == '__main__':
    print('This is main ....')
    GenerateMeshgrid()
    img_path = IMG_PATH
    detect(img_path)

4.2 .pt2onnx的代码

python 复制代码

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# 获取当前脚本文件所在目录的父目录，并构建相对路径
import os
import sys
current_dir = os.path.dirname(os.path.abspath(__file__))
project_path = os.path.join(current_dir, '..')
sys.path.append(project_path)
sys.path.append(current_dir)
#based: https://docs.ultralytics.com/modes/export/#key-features-of-export-mode
from ultralytics import YOLO

# Load a model
model = YOLO("/app/rk3588_build/last_moonpie_yolov11s.pt")  # load an official model
#model = YOLO("./best_moonpie.pt")  # load an official model
results = model(task='detect', source='../../cake26.jpg', save=True)  # predict on an image

4.1.1 关联修改1，修改yolov11-ultralytics源码： ./nn/head.py, 替换掉Detect.forward的代码

python 复制代码

    def forward(self, x):
        #fengxh modified here. at Feb17,2025
        y = [] 
        for i in range(self.nl):
            t1 = self.cv2[i](x[i])
            t2 = self.cv3[i](x[i])
            y.append(t1)
            y.append(t2)
        return y

4.1.2 关联修改2.修改onnx模型加载部分：./engine/model.py, 它重定义了.onnx输出模型参数：

python 复制代码

      print("===================onnx====================")
        import torch
        dummy_input = torch.randn(1,3,640,640)
        input_names=['data']
        output_names=['reg1', 'cls1','reg2', 'cls2','reg3', 'cls3']
        torch.onnx.export(self.model, dummy_input, '/app/rk3588_build/yolo11_selfgen.onnx', 
                          verbose=False, input_names=input_names, output_names=output_names, opset_version=11)
        print("==================onnx self gened==========")

4.3 识别结果

我的数据集不会识别馅儿：

附录A 额外提示

为了转换成功，pytorch的版本不能超过2.4，我用的版本是2.4.0严格对齐到rknn-toolkits2.3的约定。