ResNet 基于 triton ensemble部署推理服务

ResNet-50 trition ensemble

摘要
[triton ensemble 部署onnx 模型](#triton ensemble 部署onnx 模型)
[triton ensemble 部署tensorrt模型](#triton ensemble 部署tensorrt模型)

摘要

本博客详细描述如何把resent50模型的预处理、推理、后处理集合在同一个triton服务里面。完整代码可以参考：triton_ensemble_model_zoo

triton ensemble 部署onnx 模型

1、模型准备

准备好onnx模型，如果已经有onnx模型，可以跳过这一步；如果没有，可以参照我以下教程下载开源的resnet50的预训练权重，并转化成onnx.

（1）下载resnet50预训练权重

python 复制代码

import os
import torch
import torchvision.models as models

def save_resnet50_weights(save_dir='weights', weight_name="resnet50_imagenet_v1.pth"):
    """
    下载 ResNet-50 的 ImageNet 预训练权重，并保存到指定目录。

    Args:
        save_dir (str): 权重保存目录的路径
    """
    # 创建目录（如果不存在）
    os.makedirs(save_dir, exist_ok=True)

    # 定义保存路径
    save_path = os.path.join(save_dir, weight_name)

    # 检查是否已存在，避免重复下载
    if os.path.exists(save_path):
        print(f"文件已存在: {save_path}")
        return

    print("正在下载 ResNet-50 预训练权重 (IMAGENET1K_V2) ...")
    # 使用 torchvision 官方权重（V2 版本准确率更高）
    # weights 参数会自动下载并缓存，但我们单独保存到指定目录
    model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1)

    # 提取 state_dict 并保存
    torch.save(model.state_dict(), save_path)
    print(f"权重已保存至: {save_path}")


if __name__ == "__main__":
    save_resnet50_weights()

（2）pt转onnx

python 复制代码

import torch
import torchvision.models as models
import os

def convert_to_onnx(weight_path, onnx_path, input_size=(3, 224, 224)):
    """
    将 ResNet-50 的 PyTorch 权重转换为 ONNX 模型

    Args:
        weight_path (str): 本地 .pth 权重文件路径
        onnx_path (str): 输出的 ONNX 文件路径
        input_size (tuple): 输入张量的形状 (C, H, W)，默认 (3, 224, 224)
    """
    # 1. 创建模型结构（不加载预训练权重）
    model = models.resnet50(weights=None)
    device = 'cuda' if torch.cuda.is_available() else 'cpu'

    # 2. 加载本地权重
    if not os.path.exists(weight_path):
        raise FileNotFoundError(f"权重文件不存在: {weight_path}")
    state_dict = torch.load(weight_path, map_location=device)
    model.load_state_dict(state_dict)
    
    # 3. 设置设备并切换到推理模式
    model = model.to(device)
    model.eval()
    
    # 4. 构造示例输入
    batch_size = 1
    c, h, w = input_size
    example_input = torch.randn(batch_size, c, h, w).to(device)
    
    # 5. 执行一次前向传播，验证模型可用
    with torch.no_grad():
        output = model(example_input)
    print(f"前向验证通过，输出形状: {output.shape}")
    
    # 6. 导出 ONNX
    print(f"开始导出 ONNX 到: {onnx_path}")
    torch.onnx.export(
        model,
        example_input,
        onnx_path,
        export_params=True,          # 保存模型参数
        opset_version=11,            # ONNX opset 版本，常用 11/12
        do_constant_folding=True,    # 常量折叠优化
        input_names=['input'],       # 输入节点名
        output_names=['output'],     # 输出节点名
        dynamic_axes={
            'input': {0: 'batch_size'},
            'output': {0: 'batch_size'}
        },                            # 动态 batch 维度
        dynamo=False                  # 关键！强制使用 TorchScript 导出器
    )
    print("ONNX 导出成功！")

if __name__ == '__main__':
    weight_path = r'weights/resnet50_imagenet_v2.pth'
    onnx_path = r'weights/resnet50_imagenet_v2.onnx'
    input_size = (3, 224, 224)
    convert_to_onnx(weight_path, onnx_path, input_size)

2、文件夹目录准备

triton服务对模型的目录有严格的要求，需要按照以下格式进行存放：

powershell 复制代码

models/
├── resnet50_ensemble/
│   ├── 1/
│   └── config.pbtxt
├── resnet50_inference/
│   ├── 1/
│   │   └── resnet50_imagenet_v2.onnx
│   └── config.pbtxt
├── resnet50_postprocess/
│   ├── 1/
│   │   ├── imagenet_classes.txt
│   │   └── postprocess.py
│   └── config.pbtxt
└── resnet50_preprocess/
    ├── 1/
    │   └── preprocess.py
    └── config.pbtxt

resnet50_preprocess (预处理)
负责接收原始图像数据，进行解码、Resize（调整大小）、Normalize（归一化）等操作，将其转换为模型所需的 Tensor 格式。
resnet50_inference (推理)
负责模型推理
resnet50_postprocess (后处理)
负责接收模型的原始输出（通常是 Logits 或 Probabilities），执行 Softmax、Top-K 筛选等操作，并利用 classes.txt 将数字 ID 映射为人类可读的分类结果（如"猫"、"狗"）。
resnet50_ensemble (集成调度)
这是一个特殊的"虚拟"模型。它不包含实际代码或权重，而是通过 config.pbtxt 定义上述三个步骤的执行顺序和数据流向（Preprocess -> Inference -> Postprocess），对外提供一个统一的接口。

（1）resnet50_preprocess (预处理)

config.pbtxt

powershell 复制代码

name: "resnet50_preprocess"
backend: "python"
max_batch_size: 256
default_model_filename: "preprocess.py"

input [
    {
        name: "RAW_IMAGE"
        data_type: TYPE_STRING
        dims: [ 1 ]
    }
]
output [
    {
        name: "PREPROCESSED_IMAGE"
        data_type: TYPE_FP32
        dims: [ 3, 224, 224 ]
    }
]

instance_group [
  {
    count: 32
    kind: KIND_CPU
  }
]

preprocess.py

python 复制代码

import triton_python_backend_utils as pb_utils
import numpy as np
import cv2
import base64
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class TritonPythonModel:
    def initialize(self, args):
        # 预处理参数
        self.resize_size = 256
        self.crop_size = 224
        # ImageNet 均值与标准差 (RGB 顺序)
        self.mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
        self.std  = np.array([0.229, 0.224, 0.225], dtype=np.float32)

    def img_preprocess(self, img) -> np.ndarray:
        
        # 1. 获取原图尺寸
        h, w = img.shape[:2]

        # 2. 短边缩放至 256
        if w < h:
            new_w = self.resize_size
            new_h = int(h * self.resize_size / w)
        else:
            new_h = self.resize_size
            new_w = int(w * self.resize_size / h)

        img_resized = cv2.resize(img, (new_w, new_h), interpolation=cv2.INTER_LINEAR)  # (256, 256, 3)
        img_resized = cv2.cvtColor(img_resized, cv2.COLOR_BGR2RGB) # BGR to RGB

        # 3. 中心裁剪 224x224
        start_x = (new_w - self.crop_size) // 2
        start_y = (new_h - self.crop_size) // 2
        img_cropped = img_resized[start_y:start_y + self.crop_size,
                                  start_x:start_x + self.crop_size]  # (224, 224, 3)

        # 4. 归一化到 [0,1] 并转为 float32
        img_norm = img_cropped.astype(np.float32) / 255.0

        # 5. 标准化
        img_resized = (img_norm - self.mean) / self.std

        # 6. 转化通道
        img_resized = np.transpose(img_resized, (2, 0, 1))   # (H,W,C) -> (C,H,W)

        return img_resized.astype(np.float32)

    def base64_to_image(self, base64_str):
        try:
            raw_bytes = base64.b64decode(base64_str)
            nparr = np.frombuffer(raw_bytes, np.uint8)
            img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)

            return img

        except Exception as error:
            logger.error(f"Error: {error}")
            logger.error(f"error line: {error.__traceback__.tb_lineno}")

    def execute(self, requests):
        responses = []
        for request in requests:
            # 1. 获取原始图像数据 (base64), 是一个批次
            in_tensor = pb_utils.get_input_tensor_by_name(request, "RAW_IMAGE")
            if in_tensor is None:
                logger.error("Input tensor 'RAW_IMAGE' not found!")
                continue
            raw_batch = in_tensor.as_numpy()
            batch_size = raw_batch.shape[0]
            imgs_resized = []
            
            # 2. 处理批次中的每个图像
            for i in range(batch_size):
                base64_str = raw_batch[i][0]  # 获取第i个图像的字节
                # 3. 把 base64 转成 image
                img = self.base64_to_image(base64_str)
                if img is None:
                    # 如果解码失败，可以插入一个黑色图像？或者报错？这里我们插入一个黑色图像
                    logger.info("Base64 converted to image failed!")
                    img = np.zeros((self.img_size, self.img_size, 3), dtype=np.uint8)
                else:
                    logger.info("Base64 converted to image Successfully!")
                    # 3. 图像预处理
                    img_resized = self.img_preprocess(img)
                    imgs_resized.append(img_resized)
            if len(imgs_resized)==0:
                # 如果没有图像，创建一个0批次
                batch_imgs = np.zeros((batch_size, 3, self.img_size, self.img_size), dtype=np.float32)
            else:
                batch_imgs = np.stack(imgs_resized, axis=0)  # (batch, 3, 224, 224)

            # 4. 构建输出张量
            out_tensor = pb_utils.Tensor("PREPROCESSED_IMAGE", batch_imgs)
            response = pb_utils.InferenceResponse(output_tensors=[out_tensor])
            responses.append(response)
        return responses

（2）resnet50_inference (推理)

config.pbtxt

powershell 复制代码

name: "resnet50_inference"
backend: "onnxruntime"
default_model_filename: "resnet50_imagenet_v2.onnx"
max_batch_size: 256
dynamic_batching {
  max_queue_delay_microseconds: 100000
  preferred_batch_size: [ 16, 32, 64, 128, 256 ]
}

input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [3, 224, 224]  # 固定输入尺寸，NCHW 格式
  }
]

output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [1000]  # 输出维度，[batch, 1000类别]
  }
]

instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [0]  # 使用第 0 号 GPU
  }
]

（3）resnet50_postprocess (后处理)

config.pbtxt

powershell 复制代码

name: "resnet50_postprocess"
backend: "python"
max_batch_size: 256
default_model_filename: "postprocess.py"

input [
    {
        name: "LOGITS"
        data_type: TYPE_FP32
        dims: [ 1000 ]
    }
]

output [
    {
        name: "OUTPUT_RESULTS"
        data_type: TYPE_STRING
        dims: [ 1 ]
    }
]


instance_group [
  {
    count: 32
    kind: KIND_CPU
  }
]

postprocess.py

python 复制代码

import triton_python_backend_utils as pb_utils
import numpy as np
import os
import json


class TritonPythonModel:
    def initialize(self, args):
        
        # 获取当前模型（postprocess）的版本目录
        model_repository = args["model_repository"]  # 例如: /models/resnet50_postprocess
        model_version = args["model_version"]        # 例如: 1
        label_file = os.path.join(model_repository, model_version, "imagenet_classes.txt")

        # 加载ImageNet的1000个标签
        with open(label_file, 'r', encoding='utf-8') as f:
            self.labels = [line.strip() for line in f.readlines()]
    def softmax(self, logits):
        exp_logits = np.exp(logits - np.max(logits, axis=1, keepdims=True))
        probs = exp_logits / np.sum(exp_logits, axis=1, keepdims=True)
        return probs

    def execute(self, requests):
        responses = []
        for request in requests:
            # 1. 获取ONNX模型的输出logits
            logits_tensor = pb_utils.get_input_tensor_by_name(request, "LOGITS")
            logits = logits_tensor.as_numpy()  # shape: (batch_size, 1000)

            # 2. Softmax计算概率
            probs = self.softmax(logits)

            # 3. 获取Top-5的索引和对应标签
            top5_indices = np.argsort(probs, axis=1)[:, -5:][:, ::-1]
            batch_size = logits.shape[0]
            batch_results = []
            for i in range(batch_size):
                top5_labels = [self.labels[idx] for idx in top5_indices[i]]
                top5_probs = [float(probs[i][idx]) for idx in top5_indices[i]]
    
                classify_item = {
                    "top5_labels": top5_labels,
                    "top5_probs": top5_probs
                }
                result_json_str = json.dumps(classify_item)
                batch_results.append(result_json_str.encode('utf-8'))

            # 4. 输出最终结果（这里简单返回字符串和概率）
            output_array = np.array(batch_results, dtype=object)
            out_tensor = pb_utils.Tensor("OUTPUT_RESULTS", output_array)
            response = pb_utils.InferenceResponse(output_tensors=[out_tensor])
            responses.append(response)
        return responses

（4）resnet50_ensemble (集成调度)

config.pbtxt

powershell 复制代码

name: "resnet50_ensemble"
platform: "ensemble"
max_batch_size: 256

input [
    {
        name: "RAW_IMAGE"
        data_type: TYPE_STRING
        dims: [ 1 ]
    }
]

output [
    {
        name: "OUTPUT_RESULTS"
        data_type: TYPE_STRING
        dims: [ 1 ]
    }
]

ensemble_scheduling {
    step [
        {
            model_name: "resnet50_preprocess"
            model_version: -1
            input_map {
                key: "RAW_IMAGE"
                value: "RAW_IMAGE"
            }
            output_map {
                key: "PREPROCESSED_IMAGE"
                value: "preprocessed_image"
            }
        },
        {
            model_name: "resnet50_inference"
            model_version: -1
            input_map {
                key: "input"
                value: "preprocessed_image"
            }
            output_map {
                key: "output"
                value: "logits"
            }
        },
        {
            model_name: "resnet50_postprocess"
            model_version: -1
            input_map {
                key: "LOGITS"
                value: "logits"
            }
            output_map {
                key: "OUTPUT_RESULTS"
                value: "OUTPUT_RESULTS"
            }
        }
    ]
}

3、模型部署

powershell 复制代码

docker run -d \
  --gpus 1 \
  --name tritonserver \
  -p 127.0.0.1:8000:8000 \
  -v /path/to/models:/models \
  nvcr.io/nvidia/tritonserver:23.01-py3-v0.0.1 \
  CUDA_VISIBLE_DEVICES=1 tritonserver --model-repository=models --strict-model-config=false --log-verbose=1

triton ensemble 部署tensorrt模型

1、onnx转tensorrt

powershell 复制代码

docker run --gpus 1 -v $(pwd):/workspace -it nvcr.io/nvidia/tensorrt:23.01-py3 \
    bash -c \
    "cd /workspace && \
trtexec \
--onnx=resnet50_imagenet_v2.onnx \
--minShapes=input:1x3x224x224 \
--optShapes=input:256x3x224x224 \
--maxShapes=input:512x3x224x224 \
--workspace=8192 \
--saveEngine=resnet50_imagenet_v2_fp16.plan \
--explicitBatch \
--fp16"

2、文件夹目录准备

（1）在原来的文件夹目录下，把models/resnet50_inference/1路径的onnx模型替换成tensor，或者在文件夹目录下创建文件夹2，把tensorrt模型放在里面；

（2）修改models/resnet50_inference/config.pbtxt：

powershell 复制代码

name: "resnet50_inference"
backend: "tensorrt"  # 把onnxruntime改成tensorrt
default_model_filename: "resnet50_imagenet_v2_fp16.plan"  # 改成tensorrt模型名称
max_batch_size: 256
dynamic_batching {
  max_queue_delay_microseconds: 100000
  preferred_batch_size: [ 16, 32, 64, 128, 256 ]
}

input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [3, 224, 224]  # 固定输入尺寸，NCHW 格式
  }
]

output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [1000]  # 输出维度，[batch, 1000类别]
  }
]

instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [0]  # 使用第 0 号 GPU
  }
]

3、模型部署

部署和onnx部署一致。