ResNet-50 trition ensemble
- 摘要
- [triton ensemble 部署onnx 模型](#triton ensemble 部署onnx 模型)
- [triton ensemble 部署tensorrt模型](#triton ensemble 部署tensorrt模型)
摘要
本博客详细描述如何把resent50模型的预处理、推理、后处理集合在同一个triton服务里面。完整代码可以参考:triton_ensemble_model_zoo
triton ensemble 部署onnx 模型
1、模型准备
准备好onnx模型,如果已经有onnx模型,可以跳过这一步;如果没有,可以参照我以下教程下载开源的resnet50的预训练权重,并转化成onnx.
(1)下载resnet50预训练权重
python
import os
import torch
import torchvision.models as models
def save_resnet50_weights(save_dir='weights', weight_name="resnet50_imagenet_v1.pth"):
"""
下载 ResNet-50 的 ImageNet 预训练权重,并保存到指定目录。
Args:
save_dir (str): 权重保存目录的路径
"""
# 创建目录(如果不存在)
os.makedirs(save_dir, exist_ok=True)
# 定义保存路径
save_path = os.path.join(save_dir, weight_name)
# 检查是否已存在,避免重复下载
if os.path.exists(save_path):
print(f"文件已存在: {save_path}")
return
print("正在下载 ResNet-50 预训练权重 (IMAGENET1K_V2) ...")
# 使用 torchvision 官方权重(V2 版本准确率更高)
# weights 参数会自动下载并缓存,但我们单独保存到指定目录
model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1)
# 提取 state_dict 并保存
torch.save(model.state_dict(), save_path)
print(f"权重已保存至: {save_path}")
if __name__ == "__main__":
save_resnet50_weights()
(2)pt转onnx
python
import torch
import torchvision.models as models
import os
def convert_to_onnx(weight_path, onnx_path, input_size=(3, 224, 224)):
"""
将 ResNet-50 的 PyTorch 权重转换为 ONNX 模型
Args:
weight_path (str): 本地 .pth 权重文件路径
onnx_path (str): 输出的 ONNX 文件路径
input_size (tuple): 输入张量的形状 (C, H, W),默认 (3, 224, 224)
"""
# 1. 创建模型结构(不加载预训练权重)
model = models.resnet50(weights=None)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# 2. 加载本地权重
if not os.path.exists(weight_path):
raise FileNotFoundError(f"权重文件不存在: {weight_path}")
state_dict = torch.load(weight_path, map_location=device)
model.load_state_dict(state_dict)
# 3. 设置设备并切换到推理模式
model = model.to(device)
model.eval()
# 4. 构造示例输入
batch_size = 1
c, h, w = input_size
example_input = torch.randn(batch_size, c, h, w).to(device)
# 5. 执行一次前向传播,验证模型可用
with torch.no_grad():
output = model(example_input)
print(f"前向验证通过,输出形状: {output.shape}")
# 6. 导出 ONNX
print(f"开始导出 ONNX 到: {onnx_path}")
torch.onnx.export(
model,
example_input,
onnx_path,
export_params=True, # 保存模型参数
opset_version=11, # ONNX opset 版本,常用 11/12
do_constant_folding=True, # 常量折叠优化
input_names=['input'], # 输入节点名
output_names=['output'], # 输出节点名
dynamic_axes={
'input': {0: 'batch_size'},
'output': {0: 'batch_size'}
}, # 动态 batch 维度
dynamo=False # 关键!强制使用 TorchScript 导出器
)
print("ONNX 导出成功!")
if __name__ == '__main__':
weight_path = r'weights/resnet50_imagenet_v2.pth'
onnx_path = r'weights/resnet50_imagenet_v2.onnx'
input_size = (3, 224, 224)
convert_to_onnx(weight_path, onnx_path, input_size)
2、文件夹目录准备
triton服务对模型的目录有严格的要求,需要按照以下格式进行存放:
powershell
models/
├── resnet50_ensemble/
│ ├── 1/
│ └── config.pbtxt
├── resnet50_inference/
│ ├── 1/
│ │ └── resnet50_imagenet_v2.onnx
│ └── config.pbtxt
├── resnet50_postprocess/
│ ├── 1/
│ │ ├── imagenet_classes.txt
│ │ └── postprocess.py
│ └── config.pbtxt
└── resnet50_preprocess/
├── 1/
│ └── preprocess.py
└── config.pbtxt
- resnet50_preprocess (预处理)
负责接收原始图像数据,进行解码、Resize(调整大小)、Normalize(归一化)等操作,将其转换为模型所需的 Tensor 格式。 - resnet50_inference (推理)
负责模型推理 - resnet50_postprocess (后处理)
负责接收模型的原始输出(通常是 Logits 或 Probabilities),执行 Softmax、Top-K 筛选等操作,并利用 classes.txt 将数字 ID 映射为人类可读的分类结果(如"猫"、"狗")。 - resnet50_ensemble (集成调度)
这是一个特殊的"虚拟"模型。它不包含实际代码或权重,而是通过 config.pbtxt 定义上述三个步骤的执行顺序和数据流向(Preprocess -> Inference -> Postprocess),对外提供一个统一的接口。
(1)resnet50_preprocess (预处理)
config.pbtxt
powershell
name: "resnet50_preprocess"
backend: "python"
max_batch_size: 256
default_model_filename: "preprocess.py"
input [
{
name: "RAW_IMAGE"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
output [
{
name: "PREPROCESSED_IMAGE"
data_type: TYPE_FP32
dims: [ 3, 224, 224 ]
}
]
instance_group [
{
count: 32
kind: KIND_CPU
}
]
preprocess.py
python
import triton_python_backend_utils as pb_utils
import numpy as np
import cv2
import base64
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class TritonPythonModel:
def initialize(self, args):
# 预处理参数
self.resize_size = 256
self.crop_size = 224
# ImageNet 均值与标准差 (RGB 顺序)
self.mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
self.std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
def img_preprocess(self, img) -> np.ndarray:
# 1. 获取原图尺寸
h, w = img.shape[:2]
# 2. 短边缩放至 256
if w < h:
new_w = self.resize_size
new_h = int(h * self.resize_size / w)
else:
new_h = self.resize_size
new_w = int(w * self.resize_size / h)
img_resized = cv2.resize(img, (new_w, new_h), interpolation=cv2.INTER_LINEAR) # (256, 256, 3)
img_resized = cv2.cvtColor(img_resized, cv2.COLOR_BGR2RGB) # BGR to RGB
# 3. 中心裁剪 224x224
start_x = (new_w - self.crop_size) // 2
start_y = (new_h - self.crop_size) // 2
img_cropped = img_resized[start_y:start_y + self.crop_size,
start_x:start_x + self.crop_size] # (224, 224, 3)
# 4. 归一化到 [0,1] 并转为 float32
img_norm = img_cropped.astype(np.float32) / 255.0
# 5. 标准化
img_resized = (img_norm - self.mean) / self.std
# 6. 转化通道
img_resized = np.transpose(img_resized, (2, 0, 1)) # (H,W,C) -> (C,H,W)
return img_resized.astype(np.float32)
def base64_to_image(self, base64_str):
try:
raw_bytes = base64.b64decode(base64_str)
nparr = np.frombuffer(raw_bytes, np.uint8)
img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
return img
except Exception as error:
logger.error(f"Error: {error}")
logger.error(f"error line: {error.__traceback__.tb_lineno}")
def execute(self, requests):
responses = []
for request in requests:
# 1. 获取原始图像数据 (base64), 是一个批次
in_tensor = pb_utils.get_input_tensor_by_name(request, "RAW_IMAGE")
if in_tensor is None:
logger.error("Input tensor 'RAW_IMAGE' not found!")
continue
raw_batch = in_tensor.as_numpy()
batch_size = raw_batch.shape[0]
imgs_resized = []
# 2. 处理批次中的每个图像
for i in range(batch_size):
base64_str = raw_batch[i][0] # 获取第i个图像的字节
# 3. 把 base64 转成 image
img = self.base64_to_image(base64_str)
if img is None:
# 如果解码失败,可以插入一个黑色图像?或者报错?这里我们插入一个黑色图像
logger.info("Base64 converted to image failed!")
img = np.zeros((self.img_size, self.img_size, 3), dtype=np.uint8)
else:
logger.info("Base64 converted to image Successfully!")
# 3. 图像预处理
img_resized = self.img_preprocess(img)
imgs_resized.append(img_resized)
if len(imgs_resized)==0:
# 如果没有图像,创建一个0批次
batch_imgs = np.zeros((batch_size, 3, self.img_size, self.img_size), dtype=np.float32)
else:
batch_imgs = np.stack(imgs_resized, axis=0) # (batch, 3, 224, 224)
# 4. 构建输出张量
out_tensor = pb_utils.Tensor("PREPROCESSED_IMAGE", batch_imgs)
response = pb_utils.InferenceResponse(output_tensors=[out_tensor])
responses.append(response)
return responses
(2)resnet50_inference (推理)
config.pbtxt
powershell
name: "resnet50_inference"
backend: "onnxruntime"
default_model_filename: "resnet50_imagenet_v2.onnx"
max_batch_size: 256
dynamic_batching {
max_queue_delay_microseconds: 100000
preferred_batch_size: [ 16, 32, 64, 128, 256 ]
}
input [
{
name: "input"
data_type: TYPE_FP32
dims: [3, 224, 224] # 固定输入尺寸,NCHW 格式
}
]
output [
{
name: "output"
data_type: TYPE_FP32
dims: [1000] # 输出维度,[batch, 1000类别]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [0] # 使用第 0 号 GPU
}
]
(3)resnet50_postprocess (后处理)
config.pbtxt
powershell
name: "resnet50_postprocess"
backend: "python"
max_batch_size: 256
default_model_filename: "postprocess.py"
input [
{
name: "LOGITS"
data_type: TYPE_FP32
dims: [ 1000 ]
}
]
output [
{
name: "OUTPUT_RESULTS"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
instance_group [
{
count: 32
kind: KIND_CPU
}
]
postprocess.py
python
import triton_python_backend_utils as pb_utils
import numpy as np
import os
import json
class TritonPythonModel:
def initialize(self, args):
# 获取当前模型(postprocess)的版本目录
model_repository = args["model_repository"] # 例如: /models/resnet50_postprocess
model_version = args["model_version"] # 例如: 1
label_file = os.path.join(model_repository, model_version, "imagenet_classes.txt")
# 加载ImageNet的1000个标签
with open(label_file, 'r', encoding='utf-8') as f:
self.labels = [line.strip() for line in f.readlines()]
def softmax(self, logits):
exp_logits = np.exp(logits - np.max(logits, axis=1, keepdims=True))
probs = exp_logits / np.sum(exp_logits, axis=1, keepdims=True)
return probs
def execute(self, requests):
responses = []
for request in requests:
# 1. 获取ONNX模型的输出logits
logits_tensor = pb_utils.get_input_tensor_by_name(request, "LOGITS")
logits = logits_tensor.as_numpy() # shape: (batch_size, 1000)
# 2. Softmax计算概率
probs = self.softmax(logits)
# 3. 获取Top-5的索引和对应标签
top5_indices = np.argsort(probs, axis=1)[:, -5:][:, ::-1]
batch_size = logits.shape[0]
batch_results = []
for i in range(batch_size):
top5_labels = [self.labels[idx] for idx in top5_indices[i]]
top5_probs = [float(probs[i][idx]) for idx in top5_indices[i]]
classify_item = {
"top5_labels": top5_labels,
"top5_probs": top5_probs
}
result_json_str = json.dumps(classify_item)
batch_results.append(result_json_str.encode('utf-8'))
# 4. 输出最终结果(这里简单返回字符串和概率)
output_array = np.array(batch_results, dtype=object)
out_tensor = pb_utils.Tensor("OUTPUT_RESULTS", output_array)
response = pb_utils.InferenceResponse(output_tensors=[out_tensor])
responses.append(response)
return responses
(4)resnet50_ensemble (集成调度)
config.pbtxt
powershell
name: "resnet50_ensemble"
platform: "ensemble"
max_batch_size: 256
input [
{
name: "RAW_IMAGE"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
output [
{
name: "OUTPUT_RESULTS"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
ensemble_scheduling {
step [
{
model_name: "resnet50_preprocess"
model_version: -1
input_map {
key: "RAW_IMAGE"
value: "RAW_IMAGE"
}
output_map {
key: "PREPROCESSED_IMAGE"
value: "preprocessed_image"
}
},
{
model_name: "resnet50_inference"
model_version: -1
input_map {
key: "input"
value: "preprocessed_image"
}
output_map {
key: "output"
value: "logits"
}
},
{
model_name: "resnet50_postprocess"
model_version: -1
input_map {
key: "LOGITS"
value: "logits"
}
output_map {
key: "OUTPUT_RESULTS"
value: "OUTPUT_RESULTS"
}
}
]
}
3、模型部署
powershell
docker run -d \
--gpus 1 \
--name tritonserver \
-p 127.0.0.1:8000:8000 \
-v /path/to/models:/models \
nvcr.io/nvidia/tritonserver:23.01-py3-v0.0.1 \
CUDA_VISIBLE_DEVICES=1 tritonserver --model-repository=models --strict-model-config=false --log-verbose=1
triton ensemble 部署tensorrt模型
1、onnx转tensorrt
powershell
docker run --gpus 1 -v $(pwd):/workspace -it nvcr.io/nvidia/tensorrt:23.01-py3 \
bash -c \
"cd /workspace && \
trtexec \
--onnx=resnet50_imagenet_v2.onnx \
--minShapes=input:1x3x224x224 \
--optShapes=input:256x3x224x224 \
--maxShapes=input:512x3x224x224 \
--workspace=8192 \
--saveEngine=resnet50_imagenet_v2_fp16.plan \
--explicitBatch \
--fp16"
2、文件夹目录准备
(1)在原来的文件夹目录下,把models/resnet50_inference/1路径的onnx模型替换成tensor,或者在文件夹目录下创建文件夹2,把tensorrt模型放在里面;
(2)修改models/resnet50_inference/config.pbtxt:
powershell
name: "resnet50_inference"
backend: "tensorrt" # 把onnxruntime改成tensorrt
default_model_filename: "resnet50_imagenet_v2_fp16.plan" # 改成tensorrt模型名称
max_batch_size: 256
dynamic_batching {
max_queue_delay_microseconds: 100000
preferred_batch_size: [ 16, 32, 64, 128, 256 ]
}
input [
{
name: "input"
data_type: TYPE_FP32
dims: [3, 224, 224] # 固定输入尺寸,NCHW 格式
}
]
output [
{
name: "output"
data_type: TYPE_FP32
dims: [1000] # 输出维度,[batch, 1000类别]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [0] # 使用第 0 号 GPU
}
]
3、模型部署
部署和onnx部署一致。