YOLOv8n 输入输出格式笔记

1. 模型输入格式

1.1 PyTorch (PT) 模型输入

基本输入格式：

类型：torch.Tensor
形状：(batch_size, 3, height, width)
数据类型 ：float32
范围：[0, 1]（已归一化）

输入预处理：

python 复制代码

# 标准预处理流程
import cv2
import torch
import numpy as np

# 1. 读取图像
img = cv2.imread('image.jpg')  # 形状: (H, W, 3) BGR格式

# 2. 调整大小
img_resized = cv2.resize(img, (160, 160))  # 调整到模型输入尺寸

# 3. 转换颜色空间
img_rgb = cv2.cvtColor(img_resized, cv2.COLOR_BGR2RGB)  # BGR → RGB

# 4. 归一化到 [0, 1]
img_normalized = img_rgb / 255.0

# 5. 转置维度 HWC → CHW
img_chw = img_normalized.transpose(2, 0, 1)  # 形状: (3, H, W)

# 6. 添加批次维度
img_batch = np.expand_dims(img_chw, axis=0)  # 形状: (1, 3, H, W)

# 7. 转换为PyTorch张量
input_tensor = torch.from_numpy(img_batch).float()

使用 Ultralytics 自动预处理：

python 复制代码

from ultralytics import YOLO

model = YOLO('best.pt')

# Ultralytics 会自动处理预处理
results = model('image.jpg')  # 支持路径、PIL图像、numpy数组等

输入选项：

imgsz：输入图像尺寸，默认640
conf：置信度阈值，默认0.25
iou：NMS IoU阈值，默认0.45
augment：是否使用数据增强，默认False
agnostic_nms：是否使用类别无关的NMS，默认False

1.2 ONNX 模型输入

基本输入格式：

类型：numpy.ndarray
形状：(batch_size, 3, height, width)
数据类型 ：float32
范围：[0, 1]（已归一化）

输入预处理：

python 复制代码

# 标准预处理流程（与PT模型相同）
import cv2
import numpy as np

# 1. 读取图像
img = cv2.imread('image.jpg')

# 2. 调整大小
img_resized = cv2.resize(img, (160, 160))

# 3. 转换颜色空间
img_rgb = cv2.cvtColor(img_resized, cv2.COLOR_BGR2RGB)

# 4. 归一化到 [0, 1]
img_normalized = img_rgb / 255.0

# 5. 转置维度 HWC → CHW
img_chw = img_normalized.transpose(2, 0, 1)

# 6. 添加批次维度
input_data = np.expand_dims(img_chw, axis=0).astype(np.float32)

# 7. 运行ONNX模型
import onnxruntime as rt
sess = rt.InferenceSession('best.onnx')
input_name = sess.get_inputs()[0].name
outputs = sess.run(None, {input_name: input_data})

使用 Ultralytics 自动预处理：

python 复制代码

from ultralytics import YOLO

model = YOLO('best.onnx')

# Ultralytics 会自动处理预处理
results = model('image.jpg')  # 与PT模型使用方式相同

输入格式差异：

特性	PyTorch 模型	ONNX 模型
输入类型	`torch.Tensor`	`numpy.ndarray`
预处理	手动或自动	手动或自动
灵活性	支持多种输入类型	仅支持numpy数组
动态尺寸	支持	固定（导出时确定）

2. 模型输出格式

2.1 PyTorch (PT) 模型输出

2.1.1 `forward` 方法输出

返回值 ：tuple

第一个输出（原始模型输出）：

类型：torch.Tensor
形状：(batch_size, 6, num_detections)
说明：
- batch_size：批量大小
- 6：每个检测的属性数量
- num_detections：检测框数量（YOLOv8n为525）

输出格式详解：

索引	含义	范围	说明
0	x1	像素值	边界框左上角x坐标
1	y1	像素值	边界框左上角y坐标
2	x2	像素值	边界框右下角x坐标
3	y2	像素值	边界框右下角y坐标
4	占位符	0	单类别模型中为0
5	class_prob	[0, 1]	类别概率

第二个输出（后处理结果）：

类型：dict
键值对 ：
- 'boxes'：边界框
- 'scores'：置信度分数
- 'feats'：特征图

2.1.2 `predict` 方法输出

返回值 ：List[Results]

Results 对象结构：

boxes ：Boxes 对象
- xyxy ：边界框坐标，形状 (N, 4)
- conf ：置信度，形状 (N,)
- cls ：类别ID，形状 (N,)
- id：目标ID（如果启用跟踪）
- xywh：边界框中心点和宽高
- xyn：归一化坐标
masks ：Masks 对象（分割模型）
probs ：Probs 对象（分类模型）
orig_img：原始图像
orig_shape：原始图像形状
path：图像路径

2.2 ONNX 模型输出

2.2.1 直接使用 ONNX Runtime 输出

返回值 ：List[numpy.ndarray]

输出格式：

形状：(batch_size, 6, num_detections)
数据类型 ：float32
格式：与PT模型 forward 方法第一个输出相同

示例输出：

python 复制代码

# 输出形状: (1, 6, 525)
outputs = sess.run(None, {input_name: input_data})
onnx_output = outputs[0]  # 形状: (1, 6, 525)

# 提取第一个批次的输出
batch_output = onnx_output[0]  # 形状: (6, 525)

# 提取第一个检测
first_detection = batch_output[:, 0]  # 形状: (6,)
print(f"x1: {first_detection[0]}")
print(f"y1: {first_detection[1]}")
print(f"x2: {first_detection[2]}")
print(f"y2: {first_detection[3]}")
print(f"占位符: {first_detection[4]}")
print(f"class_prob: {first_detection[5]}")

2.2.2 使用 Ultralytics 加载 ONNX 模型输出

返回值 ：与PT模型 predict 方法相同，List[Results]

输出格式：与PT模型完全相同，Ultralytics 会自动处理后处理

2.3 输出格式差异对比

特性	PyTorch 模型	ONNX 模型
`forward` 输出	`(tensor, dict)`	`List[ndarray]`
`predict` 输出	`List[Results]`	`List[Results]`
原始输出形状	`(batch, 6, num_detections)`	`(batch, 6, num_detections)`
数据类型	`torch.Tensor`	`numpy.ndarray`
后处理	内置支持	需手动或使用Ultralytics
坐标格式	像素值	像素值
置信度	已应用sigmoid	已应用sigmoid

3. 多类别 vs 单类别模型输出

3.1 多类别模型输出

输出形状 ：(batch_size, 6, num_detections)

输出格式：

索引	含义	范围	说明
0	x1	像素值	边界框左上角x坐标
1	y1	像素值	边界框左上角y坐标
2	x2	像素值	边界框右下角x坐标
3	y2	像素值	边界框右下角y坐标
4	confidence	[0, 1]	目标存在的置信度
5	class_id	整数	类别ID

3.2 单类别模型输出

输出形状 ：(batch_size, 6, num_detections)

输出格式：

索引	含义	范围	说明
0	x1	像素值	边界框左上角x坐标
1	y1	像素值	边界框左上角y坐标
2	x2	像素值	边界框右下角x坐标
3	y2	像素值	边界框右下角y坐标
4	占位符	0	固定为0
5	class_prob	[0, 1]	类别概率

4. 坐标系统

4.1 边界框表示

YOLOv8 使用的坐标格式 ：xyxy

格式说明：

xyxy ：[x1, y1, x2, y2]
- x1, y1：边界框左上角坐标
- x2, y2：边界框右下角坐标

其他支持的格式：

xywh：[x_center, y_center, width, height]
xyn：归一化坐标，范围 [0, 1]

4.2 坐标转换

从 xyxy 到 xywh：

python 复制代码

def xyxy2xywh(xyxy):
    x_center = (xyxy[0] + xyxy[2]) / 2
    y_center = (xyxy[1] + xyxy[3]) / 2
    width = xyxy[2] - xyxy[0]
    height = xyxy[3] - xyxy[1]
    return [x_center, y_center, width, height]

归一化坐标：

python 复制代码

def normalize_coords(xyxy, img_width, img_height):
    x1 = xyxy[0] / img_width
    y1 = xyxy[1] / img_height
    x2 = xyxy[2] / img_width
    y2 = xyxy[3] / img_height
    return [x1, y1, x2, y2]

5. 后处理步骤

5.1 必要的后处理

1. 过滤低置信度检测：

python 复制代码

# 过滤置信度 > 0.5 的检测
conf_threshold = 0.5
filtered_detections = []
for i in range(num_detections):
    if batch_output[5, i] > conf_threshold:
        filtered_detections.append(batch_output[:, i])

2. 应用非最大抑制 (NMS)：

python 复制代码

# 应用NMS
def nms(detections, iou_threshold=0.45):
    if not detections:
        return []
    
    # 按置信度排序
    detections = sorted(detections, key=lambda x: x[5], reverse=True)
    
    selected = []
    while detections:
        current = detections.pop(0)
        selected.append(current)
        
        # 过滤重叠过高的检测
        detections = [d for d in detections if calculate_iou(current[:4], d[:4]) < iou_threshold]
    
    return selected

3. 坐标映射到原始图像：

python 复制代码

# 映射坐标到原始图像
def map_coords_to_original(det, orig_width, orig_height, input_width, input_height):
    # 计算缩放因子
    scale_x = orig_width / input_width
    scale_y = orig_height / input_height
    
    # 映射坐标
    x1 = det[0] * scale_x
    y1 = det[1] * scale_y
    x2 = det[2] * scale_x
    y2 = det[3] * scale_y
    
    return [x1, y1, x2, y2, det[4], det[5]]

5.2 PyTorch vs ONNX 后处理差异

步骤	PyTorch 模型	ONNX 模型
内置后处理	`predict` 方法自动处理	需要手动实现
Ultralytics 支持	完全支持	通过 YOLO 类支持
灵活性	可自定义后处理	需手动实现自定义逻辑
速度	较慢	可优化

6. 实际应用示例

6.1 PyTorch 模型完整流程

python 复制代码

from ultralytics import YOLO
import cv2

# 1. 加载模型
model = YOLO('best.pt')

# 2. 推理
results = model('test.mp4', imgsz=160, conf=0.5)

# 3. 处理结果
for result in results:
    # 提取边界框
    boxes = result.boxes
    
    # 处理每个检测
    for box in boxes:
        # 坐标
        xyxy = box.xyxy[0].cpu().numpy()
        x1, y1, x2, y2 = map(int, xyxy)
        
        # 置信度
        conf = box.conf[0].cpu().numpy()
        
        # 类别
        cls = int(box.cls[0].cpu().numpy())
        class_name = model.names[cls]
        
        # 绘制边界框
        color = (0, 255, 0) if class_name == 'glass_tube' else (0, 0, 255)
        cv2.rectangle(result.orig_img, (x1, y1), (x2, y2), color, 2)
        
        # 绘制标签
        label = f'{class_name}: {conf:.2f}'
        cv2.putText(result.orig_img, label, (x1, y1-10), 
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
    
    # 显示结果
    cv2.imshow('Result', result.orig_img)
    if cv2.waitKey(1) == 27:
        break

cv2.destroyAllWindows()

6.2 ONNX 模型完整流程

6.2.1 使用 Ultralytics

python 复制代码

from ultralytics import YOLO
import cv2

# 1. 加载ONNX模型
model = YOLO('best.onnx')

# 2. 推理（与PT模型相同）
results = model('test.mp4', imgsz=160, conf=0.5)

# 3. 处理结果（与PT模型相同）
for result in results:
    # 处理逻辑与PT模型相同
    boxes = result.boxes
    for box in boxes:
        # 处理每个检测
        xyxy = box.xyxy[0].cpu().numpy()
        x1, y1, x2, y2 = map(int, xyxy)
        conf = box.conf[0].cpu().numpy()
        cls = int(box.cls[0].cpu().numpy())
        class_name = model.names[cls]
        
        # 绘制
        color = (0, 255, 0) if class_name == 'glass_tube' else (0, 0, 255)
        cv2.rectangle(result.orig_img, (x1, y1), (x2, y2), color, 2)
        label = f'{class_name}: {conf:.2f}'
        cv2.putText(result.orig_img, label, (x1, y1-10), 
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
    
    cv2.imshow('ONNX Result', result.orig_img)
    if cv2.waitKey(1) == 27:
        break

cv2.destroyAllWindows()

6.2.2 直接使用 ONNX Runtime

python 复制代码

import cv2
import numpy as np
import onnxruntime as rt

# 1. 加载ONNX模型
sess = rt.InferenceSession('best.onnx')
input_name = sess.get_inputs()[0].name

# 2. 打开视频
cap = cv2.VideoCapture('test.mp4')

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # 3. 预处理输入
    orig_height, orig_width = frame.shape[:2]
    resized = cv2.resize(frame, (160, 160))
    rgb = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)
    input_data = rgb.astype(np.float32) / 255.0
    input_data = input_data.transpose(2, 0, 1)
    input_data = np.expand_dims(input_data, axis=0)
    
    # 4. 推理
    outputs = sess.run(None, {input_name: input_data})
    onnx_output = outputs[0]  # (1, 6, 525)
    batch_output = onnx_output[0]  # (6, 525)
    
    # 5. 后处理
    conf_threshold = 0.5
    detections = []
    
    # 过滤低置信度
    for i in range(batch_output.shape[1]):
        if batch_output[5, i] > conf_threshold:
            detections.append(batch_output[:, i])
    
    # 应用NMS
    def nms(detections, iou_threshold=0.45):
        if not detections:
            return []
        detections = sorted(detections, key=lambda x: x[5], reverse=True)
        selected = []
        while detections:
            current = detections.pop(0)
            selected.append(current)
            detections = [d for d in detections if calculate_iou(current[:4], d[:4]) < iou_threshold]
        return selected
    
    def calculate_iou(box1, box2):
        x1 = max(box1[0], box2[0])
        y1 = max(box1[1], box2[1])
        x2 = min(box1[2], box2[2])
        y2 = min(box1[3], box2[3])
        intersection = max(0, x2 - x1) * max(0, y2 - y1)
        area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
        area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
        union = area1 + area2 - intersection
        return intersection / union if union > 0 else 0
    
    filtered_detections = nms(detections)
    
    # 6. 映射坐标到原始图像并绘制
    for det in filtered_detections:
        # 映射坐标
        scale_x = orig_width / 160
        scale_y = orig_height / 160
        x1 = int(det[0] * scale_x)
        y1 = int(det[1] * scale_y)
        x2 = int(det[2] * scale_x)
        y2 = int(det[3] * scale_y)
        conf = det[5]
        
        # 绘制
        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
        label = f'glass_tube: {conf:.2f}'
        cv2.putText(frame, label, (x1, y1-10), 
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    
    # 7. 显示结果
    cv2.imshow('ONNX Runtime Result', frame)
    if cv2.waitKey(1) == 27:
        break

cap.release()
cv2.destroyAllWindows()

7. 性能优化技巧

7.1 输入优化

批处理：

批量推理可提高吞吐量
ONNX 模型特别受益于批处理

输入尺寸：

选择合适的输入尺寸，平衡精度和速度
YOLOv8n 推荐使用 160-320 尺寸

数据类型：

考虑使用 FP16 或 INT8 量化
可显著减少内存占用和推理时间

7.2 输出优化

提前过滤：

在 NMS 前过滤低置信度检测
减少后续处理的数据量

并行处理：

使用多线程处理后处理步骤
适合处理视频流

硬件加速：

使用 GPU 或专用硬件加速器
配置 ONNX Runtime 使用最佳执行提供程序

8. 常见问题与解决方案

8.1 坐标超出图像范围

问题：检测框坐标超出图像边界

原因：

模型预测的坐标可能超出输入尺寸
映射到原始图像时缩放因子计算错误

解决方案：

python 复制代码

# 裁剪坐标到有效范围
def clip_coords(xyxy, img_shape):
    x1 = max(0, min(xyxy[0], img_shape[1]))
    y1 = max(0, min(xyxy[1], img_shape[0]))
    x2 = max(0, min(xyxy[2], img_shape[1]))
    y2 = max(0, min(xyxy[3], img_shape[0]))
    return [x1, y1, x2, y2]

8.2 类别标签错误

问题：检测结果的类别标签不正确

原因：

ONNX 模型可能不包含类别名称信息
类别 ID 映射错误

解决方案：

python 复制代码

# 手动提供类别映射
class_names = {
    0: 'flame',
    1: 'glass_tube'
}

# 使用类别映射
class_id = int(det[5])
class_name = class_names.get(class_id, f'class_{class_id}')

8.3 推理速度慢

问题：ONNX 模型推理速度不如预期

原因：

未使用硬件加速
后处理步骤效率低
输入尺寸过大

解决方案：

python 复制代码

# 配置 ONNX Runtime 使用 GPU
sess_options = rt.SessionOptions()
sess_options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_ENABLE_ALL

# 使用 CUDA 执行提供程序
providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
sess = rt.InferenceSession('best.onnx', sess_options, providers=providers)

9. 总结

9.1 输入格式总结

格式	PyTorch 模型	ONNX 模型
输入类型	`torch.Tensor`	`numpy.ndarray`
形状	`(N, 3, H, W)`	`(N, 3, H, W)`
数据范围	`[0, 1]`	`[0, 1]`
预处理	手动或自动	手动或自动
灵活性	高	中等

9.2 输出格式总结

格式	PyTorch 模型	ONNX 模型
`forward` 输出	`(tensor, dict)`	`List[ndarray]`
`predict` 输出	`List[Results]`	`List[Results]`
原始输出形状	`(N, 6, 525)`	`(N, 6, 525)`
后处理	内置支持	需要手动实现
坐标格式	`xyxy`	`xyxy`
类别表示	ID + 名称	ID + 需手动映射

9.3 选择建议

选择 PyTorch 模型：

需要模型修改和扩展
开发和研究阶段
需要训练和微调
对推理速度要求不高

选择 ONNX 模型：

部署到生产环境
资源受限设备
需要跨平台兼容性
对推理速度有较高要求

10. 附录：完整的输出格式示例

10.1 单类别模型输出示例

复制代码

# 输出形状: (1, 6, 525)
[
  [
    # 批次 0
    [
      # x1 坐标 (525个值)
      [17.00, 21.09, 21.04, ..., 157.31]
    ],
    [
      # y1 坐标
      [26.61, 22.96, 15.75, ..., 156.76]
    ],
    [
      # x2 坐标
      [36.34, 42.17, 43.51, ..., 450.47]
    ],
    [
      # y2 坐标
      [53.69, 46.13, 31.67, ..., 298.25]
    ],
    [
      # 占位符
      [0.00, 0.00, 0.00, ..., 0.00]
    ],
    [
      # 类别概率
      [0.00, 0.00, 0.00, ..., 0.94]
    ]
  ]
]

10.2 多类别模型输出示例

复制代码

# 输出形状: (1, 6, 525)
[
  [
    # 批次 0
    [
      # x1 坐标
      [17.00, 21.09, 21.04, ..., 157.31]
    ],
    [
      # y1 坐标
      [26.61, 22.96, 15.75, ..., 156.76]
    ],
    [
      # x2 坐标
      [36.34, 42.17, 43.51, ..., 450.47]
    ],
    [
      # y2 坐标
      [53.69, 46.13, 31.67, ..., 298.25]
    ],
    [
      # 置信度
      [0.12, 0.34, 0.05, ..., 0.95]
    ],
    [
      # 类别ID
      [0.00, 1.00, 0.00, ..., 1.00]
    ]
  ]
]

11. 参考资料

最后更新 ：2026-01-30
作者：YOLOv8 输入输出格式详细指南

YOLOv8n 输入输出格式笔记

1. 模型输入格式

1.1 PyTorch (PT) 模型输入

1.2 ONNX 模型输入

2. 模型输出格式

2.1 PyTorch (PT) 模型输出

2.1.1 forward 方法输出

2.1.2 predict 方法输出

2.2 ONNX 模型输出

2.2.1 直接使用 ONNX Runtime 输出

2.2.2 使用 Ultralytics 加载 ONNX 模型输出

2.3 输出格式差异对比

3. 多类别 vs 单类别模型输出

3.1 多类别模型输出

3.2 单类别模型输出

4. 坐标系统

4.1 边界框表示

4.2 坐标转换

5. 后处理步骤

5.1 必要的后处理

5.2 PyTorch vs ONNX 后处理差异

6. 实际应用示例

6.1 PyTorch 模型完整流程

6.2 ONNX 模型完整流程

6.2.1 使用 Ultralytics

6.2.2 直接使用 ONNX Runtime

7. 性能优化技巧

7.1 输入优化

7.2 输出优化

8. 常见问题与解决方案

8.1 坐标超出图像范围

8.2 类别标签错误

8.3 推理速度慢

9. 总结

9.1 输入格式总结

9.2 输出格式总结

9.3 选择建议

10. 附录：完整的输出格式示例

10.1 单类别模型输出示例

10.2 多类别模型输出示例

11. 参考资料

2.1.1 `forward` 方法输出

2.1.2 `predict` 方法输出