deeplabv3 triton ensemble
- 摘要
- [triton ensemble 部署 onnx 模型](#triton ensemble 部署 onnx 模型)
- [triton ensemble 部署 tensorrt 模型](#triton ensemble 部署 tensorrt 模型)
摘要
本博客详细地描述如何把语意分割(deeplabv3)模型的预处理、推理、后处理部署在同一个triton服务里面,其中,使用预处理和后处理采用python为backend实现,推理使用onnxruntime/tensorrt为backend,最后通过triton的ensemble定制预处理、推理和后处理的流水线。完成的项目代码可参考:triton_ensemble_model_zoo
triton ensemble 部署 onnx 模型
1、torch转onnx
python
import torch
import torchvision
import onnx
# ========== 1. 加载预训练模型 ==========
model = torchvision.models.segmentation.deeplabv3_resnet50(
weights=None, # 不使用内置预训练权重
aux_loss=True, # 启用 aux_classifier,以匹配权重文件
num_classes=21 # COCO 标准是 21 类(含背景),请按需修改
)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
weight_path = r'weights/deeplabv3_resnet50_coco.pth'
state_dict = torch.load(weight_path, map_location=device)
model.load_state_dict(state_dict)
model = model.to(device).eval()
# ========== 2. 准备示例输入 ==========
# ONNX 导出需要提供一个示例输入张量,用于追踪计算图。
# 这里 batch_size 设为 1,但稍后会通过 dynamic_axes 声明为动态。
batch_size = 1
num_channels = 3
height, width = 512, 512 # DeepLabV3 期望的典型输入尺寸
dummy_input = torch.randn(batch_size, num_channels, height, width)
# ========== 3. 定义动态轴 ==========
# 在导出时,指定哪些维度是动态的(可变长度)。
# 我们通常希望 batch 维度动态变化,也可以让高度/宽度动态(但需要注意模型内部可能对尺寸有隐含约束)。
dynamic_axes = {
"input": {0: "batch_size"}, # 输入张量的第 0 维是 batch
"output": {0: "batch_size"}, # 输出字典中的 "out" 张量的第 0 维是 batch
# 如果还需要动态图像大小,可以取消注释下面的行,但可能需要对预处理做额外处理
# "input": {2: "height", 3: "width"},
# "output": {2: "height", 3: "width"},
}
# ========== 4. 导出 ONNX 模型 ==========
onnx_file_path = "weights/deeplabv3_resnet50_coco.onnx"
torch.onnx.export(
model,
dummy_input, # 示例输入
onnx_file_path, # 保存路径
input_names=["input"], # 输入名字
output_names=["output"], # 输出名字
dynamic_axes=dynamic_axes, # 动态轴配置
opset_version=11, # ONNX opset 版本,建议 >=11
do_constant_folding=True, # 折叠常量优化
dynamo=False, # 是否打印导出日志
)
print(f"ONNX 模型已保存至: {onnx_file_path}")
# ========== 5. 验证导出的 ONNX 模型(可选) ==========
# 加载 ONNX 模型进行结构验证
onnx_model = onnx.load(onnx_file_path)
onnx.checker.check_model(onnx_model)
print("ONNX 模型验证通过!")
# ========== 6. 使用 ONNX Runtime 进行简单推理测试(可选)==========
try:
import onnxruntime as ort
import numpy as np
# 创建 ONNX Runtime 推理会话
ort_session = ort.InferenceSession(onnx_file_path)
# 准备不同 batch 大小的输入,测试动态 batch 是否正常工作
for test_batch in [1, 2, 4]:
test_input = np.random.randn(test_batch, 3, height, width).astype(np.float32)
outputs = ort_session.run(["output"], {"input": test_input})
print(f"Batch size {test_batch} -> 输出形状: {outputs[0].shape}")
except ImportError:
print("未安装 onnxruntime,跳过动态 batch 测试。可运行 pip install onnxruntime 安装")
2、项目代码
triton服务对文件夹目录有严格要求,文件夹如下存放:
powershell
models/
├── deeplabv3_ensemble/
│ ├── 1/
│ └── config.pbtxt
├── deeplabv3_inference/
│ ├── 1/
│ │ └── deeplabv3_resnet50_coco.onnx
│ └── config.pbtxt
├── deeplabv3_postprocess/
│ ├── 1/
│ │ └── postprocess.py
│ └── config.pbtxt
└── deeplabv3_preprocess/
├── 1/
│ └── preprocess.py
└── config.pbtxt
- deeplabv3_preprocess (预处理)
作用:负责接收原始图像数据,进行解码、Resize(调整大小至模型输入尺寸,如 512x512)、Normalize(归一化)等操作,将其转换为模型所需的 Tensor 格式。 - deeplabv3_inference (推理)
存放模型和配置文件。 - deeplabv3_postprocess (后处理)
作用:负责接收模型的原始输出(通常是 Logits 或 Argmax 后的掩码图),执行颜色映射(Color Map)、调整回原图尺寸等操作,最终生成可视化的分割结果。 - deeplabv3_ensemble (集成调度)
这是一个特殊的"虚拟"模型。它不包含实际代码或权重,而是通过 config.pbtxt 定义上述三个步骤的执行顺序和数据流向(Preprocess -> Inference -> Postprocess),对外提供一个统一的 API 接口。
(1)deeplabv3_preprocess (预处理)
config.pbtxt
powershell
name: "deeplabv3_preprocess"
backend: "python"
max_batch_size: 128
default_model_filename: "preprocess.py"
input [
{
name: "RAW_IMAGE"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
output [
{
name: "PREPROCESSED_IMAGE"
data_type: TYPE_FP32
dims: [ 3, 512, 512 ]
},
{
name: "IMAGE_SHAPE"
data_type: TYPE_INT64
dims: [ 2 ]
}
]
instance_group [
{
count: 32
kind: KIND_CPU
}
]
preprocess.py
python
import triton_python_backend_utils as pb_utils
import numpy as np
import cv2
import base64
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class TritonPythonModel:
def initialize(self, args):
self.input_height = 512
self.input_width = 512
self.mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
self.std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
def img_preprocess(self, image_bgr) -> np.ndarray:
"""
使用 OpenCV 和 NumPy 预处理图像
Args:
image_bgr: BGR 格式的图像 (H, W, 3),dtype=uint8
Returns:
input_tensor: (1, 3, H, W) float32 numpy array,已归一化
"""
# 1. BGR -> RGB
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
# 2. Resize 到固定尺寸 (520, 520),使用双线性插值
resized = cv2.resize(image_rgb, (self.input_width, self.input_height), interpolation=cv2.INTER_LINEAR)
# 3. 归一化:uint8 [0,255] -> float [0,1]
img_float = resized.astype(np.float32) / 255.0
# 4. 标准化 (ImageNet 统计)
img_norm = (img_float - self.mean) / self.std
# 5. HWC -> CHW 并添加 batch 维度
input_tensor = np.transpose(img_norm, (2, 0, 1)) # (3, H, W)
return input_tensor
def base64_to_image(self, base64_str):
try:
raw_bytes = base64.b64decode(base64_str)
nparr = np.frombuffer(raw_bytes, np.uint8)
img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
return img
except Exception as error:
logger.error(f"Error: {error}")
logger.error(f"error line: {error.__traceback__.tb_lineno}")
def execute(self, requests):
responses = []
for request in requests:
# 1. 获取原始图像数据 (base64), 是一个批次
in_tensor = pb_utils.get_input_tensor_by_name(request, "RAW_IMAGE")
raw_batch = in_tensor.as_numpy()
batch_size = raw_batch.shape[0]
imgs_resized = []
origins_shape = []
# 2. 处理批次中的每个图像
for i in range(batch_size):
base64_str = raw_batch[i][0] # 获取第i个图像的字节
# 3. 把 base64 转成 image
img = self.base64_to_image(base64_str)
if img is None:
# 如果解码失败,可以插入一个黑色图像?或者报错?这里我们插入一个黑色图像
logger.info("Base64 converted to image failed!")
img = np.zeros((self.img_size, self.img_size, 3), dtype=np.uint8)
origins_shape.append([self.input_width, self.input_height])
else:
logger.info("Base64 converted to image Successfully!")
orig_h, orig_w = img.shape[:2]
origins_shape.append([orig_h, orig_w])
# 4. 图像预处理
img_resized = self.img_preprocess(img)
imgs_resized.append(img_resized)
if len(imgs_resized)==0:
# 如果没有图像,创建一个0批次
batch_imgs = np.zeros((batch_size, 3, self.input_width, self.input_height), dtype=np.float32)
batch_shapes = np.zeros((batch_size, 2), dtype=np.int64)
else:
batch_imgs = np.stack(imgs_resized, axis=0) # (batch, 3, 520, 520)
batch_shapes = np.array(origins_shape, dtype=np.int64) # (batch_size, 2)
# 5. 构建输出张量
out_tensor = pb_utils.Tensor("PREPROCESSED_IMAGE", batch_imgs)
out_shape = pb_utils.Tensor("IMAGE_SHAPE", batch_shapes)
response = pb_utils.InferenceResponse(output_tensors=[out_tensor, out_shape])
responses.append(response)
return responses
(2)deeplabv3_inference (推理)
config.pbtxt
powershell
name: "deeplabv3_inference"
backend: "onnxruntime"
default_model_filename: "deeplabv3_resnet50_coco.onnx"
max_batch_size: 128
dynamic_batching {
max_queue_delay_microseconds: 100000
preferred_batch_size: [ 4, 16, 32, 64, 128 ]
}
input [
{
name: "input"
data_type: TYPE_FP32
dims: [3, 512, 512]
}
]
output [
{
name: "output"
data_type: TYPE_FP32
dims: [-1, -1, -1]
},
{
name: "621"
data_type: TYPE_FP32
dims: [-1, -1, -1]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [0] # 使用第 0 号 GPU
}
]
(3)deeplabv3_postprocess (后处理)
config.pbtxt
powershell
name: "deeplabv3_postprocess"
backend: "python"
max_batch_size: 128
default_model_filename: "postprocess.py"
input [
{
name: "MASKS"
data_type: TYPE_FP32
dims: [-1, -1, -1]
},
{
name: "IMAGE_SHAPE"
data_type: TYPE_INT64
dims: [ 2 ]
}
]
output [
{
name: "OUTPUT_RESULTS"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
instance_group [
{
count: 32
kind: KIND_CPU
}
]
postprocess.py
python
import triton_python_backend_utils as pb_utils
import numpy as np
import base64
import json
import cv2
class TritonPythonModel:
def initialize(self, args):
pass
def mask_to_base64(self, mask):
"""
input:
mask: [h, w], 像素值在[0, 21]
output:
base64_encoding:base64
"""
mask_image = mask[:, :, np.newaxis]
success, buffer = cv2.imencode('.png', mask_image)
if not success:
raise ValueError("图像编码失败")
base64_mask = base64.b64encode(buffer).decode('utf-8')
return base64_mask
def postprocess_mask(self, output, original_size=None):
"""
后处理:取 argmax,可选上采样回原始尺寸
Args:
output: (num_classes, H, W) numpy array
original_size: (width, height) 原始图像尺寸
Returns:
mask: (H, W) 或 (orig_H, orig_W) numpy array,dtype=uint8
"""
# 取类别索引
mask = np.argmax(output, axis=0) # (H, W)
mask = mask.astype(np.uint8) # (H, W)
if original_size is not None:
orig_h, orig_w = original_size
mask = cv2.resize(mask, (orig_w, orig_h), interpolation=cv2.INTER_NEAREST)
return mask
def execute(self, requests):
responses = []
for request in requests:
masks_tensor = pb_utils.get_input_tensor_by_name(request, "MASKS")
masks = masks_tensor.as_numpy()
shape_tensor = pb_utils.get_input_tensor_by_name(request, "IMAGE_SHAPE")
origins_shape = shape_tensor.as_numpy()
batch_size = masks.shape[0]
batch_results = []
for i in range(batch_size):
mask = self.postprocess_mask(masks[i], origins_shape[i])
mask_item = {
"mask": self.mask_to_base64(mask),
"mask_shape": list(mask.shape),
"min_class": int(mask.min()),
"max_class": int(mask.max()),
# "origin shape:": list(origins_shape[i])
}
result_json_str = json.dumps(mask_item)
batch_results.append(result_json_str.encode('utf-8'))
output_array = np.array(batch_results, dtype=object)
out_tensor = pb_utils.Tensor("OUTPUT_RESULTS", output_array)
response = pb_utils.InferenceResponse(output_tensors=[out_tensor])
responses.append(response)
return responses
(4)deeplabv3_ensemble (集成调度)
config.pbtxt
powershell
name: "deeplabv3_ensemble"
platform: "ensemble"
max_batch_size: 128
input [
{
name: "RAW_IMAGE"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
output [
{
name: "OUTPUT_RESULTS"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
ensemble_scheduling {
step [
{
model_name: "deeplabv3_preprocess"
model_version: -1
input_map {
key: "RAW_IMAGE"
value: "RAW_IMAGE"
}
output_map [
{
key: "PREPROCESSED_IMAGE"
value: "preprocessed_image"
},
{
key: "IMAGE_SHAPE"
value: "IMAGE_SHAPE"
}
]
},
{
model_name: "deeplabv3_inference"
model_version: -1
input_map {
key: "input"
value: "preprocessed_image"
}
output_map {
key: "output"
value: "output"
}
},
{
model_name: "deeplabv3_postprocess"
model_version: -1
input_map [
{
key: "MASKS"
value: "output"
},
{
key: "IMAGE_SHAPE"
value: "IMAGE_SHAPE"
}
]
output_map {
key: "OUTPUT_RESULTS"
value: "OUTPUT_RESULTS"
}
}
]
}
3、模型部署
powershell
docker run -d \
--gpus 1 \
--name tritonserver \
-p 127.0.0.1:8000:8000 \
-v deeplabv3/models:/models \
nvcr.io/nvidia/tritonserver:23.01-py3-v0.0.1 \
CUDA_VISIBLE_DEVICES=0 tritonserver --model-repository=/models --strict-model-config=false --log-verbose=1
注意:tritonserver镜像需要下载opencv依赖库
triton ensemble 部署 tensorrt 模型
1、onnx转tensorrt
powershell
nerdctl run --gpus 1 -v $(pwd):/workspace -it nvcr.io/nvidia/tensorrt:23.01-py3 \
bash -c \
"cd /workspace && \
trtexec \
--onnx=deeplabv3_resnet50_coco.onnx \
--minShapes=input:1x3x512x512 \
--optShapes=input:64x3x512x512 \
--maxShapes=input:128x3x512x512 \
--workspace=8192 \
--saveEngine=deeplabv3_resnet50_coco_fp16.plan \
--explicitBatch \
--fp16"
2、项目代码
项目代码和部署onnx一致,只需要修改deeplabv3_inference部分:
- 把tēnsorrt模型替换models/deeplabv3_inference/1/中的onnx模型,或者在文件夹1的同目录下创建文件2,把tensorrt模型放在文件2里面。
- 修改models/deeplabv3_inference/config.pbtxt:
powershell
name: "deeplabv3_inference"
backend: "tensorrt" ## 把onnxruntime改成tensorrt
default_model_filename: "deeplabv3_resnet50_coco_fp16.plan" # 修改为tensorrt的名称
max_batch_size: 128
dynamic_batching {
max_queue_delay_microseconds: 100000
preferred_batch_size: [ 4, 16, 32, 64, 128 ]
}
input [
{
name: "input"
data_type: TYPE_FP32
dims: [3, 512, 512]
}
]
output [
{
name: "output"
data_type: TYPE_FP32
dims: [-1, -1, -1]
},
{
name: "621"
data_type: TYPE_FP32
dims: [-1, -1, -1]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [0] # 使用第 0 号 GPU
}
]
3、模型部署
和onnx部署一致。