【triton】语意分割(Deep Labv3 )基于triton ensemble部署推理服务

deeplabv3 triton ensemble

摘要

本博客详细地描述如何把语意分割(deeplabv3)模型的预处理、推理、后处理部署在同一个triton服务里面,其中,使用预处理和后处理采用python为backend实现,推理使用onnxruntime/tensorrt为backend,最后通过triton的ensemble定制预处理、推理和后处理的流水线。完成的项目代码可参考:triton_ensemble_model_zoo

triton ensemble 部署 onnx 模型

1、torch转onnx

python 复制代码
import torch
import torchvision
import onnx

# ========== 1. 加载预训练模型 ==========
model = torchvision.models.segmentation.deeplabv3_resnet50(
    weights=None,          # 不使用内置预训练权重
    aux_loss=True,         # 启用 aux_classifier,以匹配权重文件
    num_classes=21         # COCO 标准是 21 类(含背景),请按需修改
)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
weight_path = r'weights/deeplabv3_resnet50_coco.pth'
state_dict = torch.load(weight_path, map_location=device)
model.load_state_dict(state_dict)
model = model.to(device).eval()

# ========== 2. 准备示例输入 ==========
# ONNX 导出需要提供一个示例输入张量,用于追踪计算图。
# 这里 batch_size 设为 1,但稍后会通过 dynamic_axes 声明为动态。
batch_size = 1
num_channels = 3
height, width = 512, 512   # DeepLabV3 期望的典型输入尺寸
dummy_input = torch.randn(batch_size, num_channels, height, width)

# ========== 3. 定义动态轴 ==========
# 在导出时,指定哪些维度是动态的(可变长度)。
# 我们通常希望 batch 维度动态变化,也可以让高度/宽度动态(但需要注意模型内部可能对尺寸有隐含约束)。
dynamic_axes = {
    "input": {0: "batch_size"},        # 输入张量的第 0 维是 batch
    "output": {0: "batch_size"},       # 输出字典中的 "out" 张量的第 0 维是 batch
    # 如果还需要动态图像大小,可以取消注释下面的行,但可能需要对预处理做额外处理
    # "input": {2: "height", 3: "width"},
    # "output": {2: "height", 3: "width"},
}

# ========== 4. 导出 ONNX 模型 ==========
onnx_file_path = "weights/deeplabv3_resnet50_coco.onnx"
torch.onnx.export(
    model,
    dummy_input,                         # 示例输入
    onnx_file_path,                      # 保存路径
    input_names=["input"],               # 输入名字
    output_names=["output"],             # 输出名字
    dynamic_axes=dynamic_axes,           # 动态轴配置
    opset_version=11,                    # ONNX opset 版本,建议 >=11
    do_constant_folding=True,            # 折叠常量优化
    dynamo=False,                       # 是否打印导出日志
)

print(f"ONNX 模型已保存至: {onnx_file_path}")

# ========== 5. 验证导出的 ONNX 模型(可选) ==========
# 加载 ONNX 模型进行结构验证
onnx_model = onnx.load(onnx_file_path)
onnx.checker.check_model(onnx_model)
print("ONNX 模型验证通过!")

# ========== 6. 使用 ONNX Runtime 进行简单推理测试(可选)==========
try:
    import onnxruntime as ort
    import numpy as np

    # 创建 ONNX Runtime 推理会话
    ort_session = ort.InferenceSession(onnx_file_path)

    # 准备不同 batch 大小的输入,测试动态 batch 是否正常工作
    for test_batch in [1, 2, 4]:
        test_input = np.random.randn(test_batch, 3, height, width).astype(np.float32)
        outputs = ort_session.run(["output"], {"input": test_input})
        print(f"Batch size {test_batch} -> 输出形状: {outputs[0].shape}")
except ImportError:
    print("未安装 onnxruntime,跳过动态 batch 测试。可运行 pip install onnxruntime 安装")

2、项目代码

triton服务对文件夹目录有严格要求,文件夹如下存放:

powershell 复制代码
models/
├── deeplabv3_ensemble/
│   ├── 1/
│   └── config.pbtxt
├── deeplabv3_inference/
│   ├── 1/
│   │   └── deeplabv3_resnet50_coco.onnx
│   └── config.pbtxt
├── deeplabv3_postprocess/
│   ├── 1/
│   │   └── postprocess.py
│   └── config.pbtxt
└── deeplabv3_preprocess/
    ├── 1/
    │   └── preprocess.py
    └── config.pbtxt
  • deeplabv3_preprocess (预处理)
    作用:负责接收原始图像数据,进行解码、Resize(调整大小至模型输入尺寸,如 512x512)、Normalize(归一化)等操作,将其转换为模型所需的 Tensor 格式。
  • deeplabv3_inference (推理)
    存放模型和配置文件。
  • deeplabv3_postprocess (后处理)
    作用:负责接收模型的原始输出(通常是 Logits 或 Argmax 后的掩码图),执行颜色映射(Color Map)、调整回原图尺寸等操作,最终生成可视化的分割结果。
  • deeplabv3_ensemble (集成调度)
    这是一个特殊的"虚拟"模型。它不包含实际代码或权重,而是通过 config.pbtxt 定义上述三个步骤的执行顺序和数据流向(Preprocess -> Inference -> Postprocess),对外提供一个统一的 API 接口。

(1)deeplabv3_preprocess (预处理)

config.pbtxt
powershell 复制代码
name: "deeplabv3_preprocess"
backend: "python"
max_batch_size: 128
default_model_filename: "preprocess.py"

input [
    {
        name: "RAW_IMAGE"
        data_type: TYPE_STRING
        dims: [ 1 ]
    }
]
output [
    {
        name: "PREPROCESSED_IMAGE"
        data_type: TYPE_FP32
        dims: [ 3, 512, 512 ]
    },
    {
        name: "IMAGE_SHAPE"
        data_type: TYPE_INT64
        dims: [ 2 ]
    }
]

instance_group [
  {
    count: 32
    kind: KIND_CPU
  }
]
preprocess.py
python 复制代码
import triton_python_backend_utils as pb_utils
import numpy as np
import cv2
import base64
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class TritonPythonModel:
    def initialize(self, args):
        self.input_height = 512
        self.input_width = 512
        self.mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
        self.std = np.array([0.229, 0.224, 0.225], dtype=np.float32)

    def img_preprocess(self, image_bgr) -> np.ndarray:
        """
        使用 OpenCV 和 NumPy 预处理图像
        Args:
            image_bgr: BGR 格式的图像 (H, W, 3),dtype=uint8
        Returns:
            input_tensor: (1, 3, H, W) float32 numpy array,已归一化
        """
        # 1. BGR -> RGB
        image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
        
        # 2. Resize 到固定尺寸 (520, 520),使用双线性插值
        resized = cv2.resize(image_rgb, (self.input_width, self.input_height), interpolation=cv2.INTER_LINEAR)
        
        # 3. 归一化:uint8 [0,255] -> float [0,1]
        img_float = resized.astype(np.float32) / 255.0
        
        # 4. 标准化 (ImageNet 统计)
        img_norm = (img_float - self.mean) / self.std
        
        # 5. HWC -> CHW 并添加 batch 维度
        input_tensor = np.transpose(img_norm, (2, 0, 1))   # (3, H, W)
        
        return input_tensor

    def base64_to_image(self, base64_str):
        try:
            raw_bytes = base64.b64decode(base64_str)
            nparr = np.frombuffer(raw_bytes, np.uint8)
            img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
            return img

        except Exception as error:
            logger.error(f"Error: {error}")
            logger.error(f"error line: {error.__traceback__.tb_lineno}")

    def execute(self, requests):
        responses = []
        for request in requests:
            # 1. 获取原始图像数据 (base64), 是一个批次
            in_tensor = pb_utils.get_input_tensor_by_name(request, "RAW_IMAGE")
            raw_batch = in_tensor.as_numpy()
            batch_size = raw_batch.shape[0]
            imgs_resized = []
            origins_shape = []
            
            # 2. 处理批次中的每个图像
            for i in range(batch_size):
                base64_str = raw_batch[i][0]  # 获取第i个图像的字节
                # 3. 把 base64 转成 image
                img = self.base64_to_image(base64_str)
                if img is None:
                    # 如果解码失败,可以插入一个黑色图像?或者报错?这里我们插入一个黑色图像
                    logger.info("Base64 converted to image failed!")
                    img = np.zeros((self.img_size, self.img_size, 3), dtype=np.uint8)
                    origins_shape.append([self.input_width, self.input_height])
                else:
                    logger.info("Base64 converted to image Successfully!")
                    orig_h, orig_w = img.shape[:2]
                    origins_shape.append([orig_h, orig_w])
                    # 4. 图像预处理
                    img_resized = self.img_preprocess(img)
                    imgs_resized.append(img_resized)
            if len(imgs_resized)==0:
                # 如果没有图像,创建一个0批次
                batch_imgs = np.zeros((batch_size, 3, self.input_width, self.input_height), dtype=np.float32)
                batch_shapes = np.zeros((batch_size, 2), dtype=np.int64)
            else:
                batch_imgs = np.stack(imgs_resized, axis=0)  # (batch, 3, 520, 520)
                batch_shapes = np.array(origins_shape, dtype=np.int64)  # (batch_size, 2)

            # 5. 构建输出张量
            out_tensor = pb_utils.Tensor("PREPROCESSED_IMAGE", batch_imgs)
            out_shape = pb_utils.Tensor("IMAGE_SHAPE", batch_shapes)
            response = pb_utils.InferenceResponse(output_tensors=[out_tensor, out_shape])
            responses.append(response)
        return responses

(2)deeplabv3_inference (推理)

config.pbtxt
powershell 复制代码
name: "deeplabv3_inference"
backend: "onnxruntime"
default_model_filename: "deeplabv3_resnet50_coco.onnx"
max_batch_size: 128

dynamic_batching {
  max_queue_delay_microseconds: 100000
  preferred_batch_size: [ 4, 16, 32, 64, 128 ]
}

input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [3, 512, 512]
  }
]

output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [-1, -1, -1]
  },
  {
    name: "621"
    data_type: TYPE_FP32
    dims: [-1, -1, -1]
  }
]

instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [0]  # 使用第 0 号 GPU
  }
]

(3)deeplabv3_postprocess (后处理)

config.pbtxt
powershell 复制代码
name: "deeplabv3_postprocess"
backend: "python"
max_batch_size: 128
default_model_filename: "postprocess.py"

input [
    {
        name: "MASKS"
        data_type: TYPE_FP32
        dims: [-1, -1, -1]
    },
    {
        name: "IMAGE_SHAPE"
        data_type: TYPE_INT64
        dims: [ 2 ]
    }
]

output [
    {
        name: "OUTPUT_RESULTS"
        data_type: TYPE_STRING
        dims: [ 1 ]
    }
]


instance_group [
  {
    count: 32
    kind: KIND_CPU
  }
]
postprocess.py
python 复制代码
import triton_python_backend_utils as pb_utils
import numpy as np
import base64
import json
import cv2


class TritonPythonModel:
    def initialize(self, args):
        pass

    def mask_to_base64(self, mask):
        """
        input:
            mask: [h, w], 像素值在[0, 21]
        output:
            base64_encoding:base64
        """
        mask_image = mask[:, :, np.newaxis]
        success, buffer = cv2.imencode('.png', mask_image)
        if not success:
            raise ValueError("图像编码失败")
        base64_mask = base64.b64encode(buffer).decode('utf-8')
        return base64_mask
        
    def postprocess_mask(self, output, original_size=None):
        """
        后处理:取 argmax,可选上采样回原始尺寸
        Args:
            output: (num_classes, H, W) numpy array
            original_size: (width, height) 原始图像尺寸
        Returns:
            mask: (H, W) 或 (orig_H, orig_W) numpy array,dtype=uint8
        """
        # 取类别索引
        mask = np.argmax(output, axis=0)  # (H, W)
        mask = mask.astype(np.uint8)   # (H, W)
        
        if original_size is not None:
            orig_h, orig_w = original_size
            mask = cv2.resize(mask, (orig_w, orig_h), interpolation=cv2.INTER_NEAREST)
        
        return mask

    def execute(self, requests):
        responses = []
        for request in requests:
            masks_tensor = pb_utils.get_input_tensor_by_name(request, "MASKS")
            masks = masks_tensor.as_numpy()

            shape_tensor = pb_utils.get_input_tensor_by_name(request, "IMAGE_SHAPE")
            origins_shape = shape_tensor.as_numpy()

            batch_size = masks.shape[0]
            batch_results = []
            for i in range(batch_size):
                mask = self.postprocess_mask(masks[i], origins_shape[i])
    
                mask_item = {
                    "mask": self.mask_to_base64(mask),
                    "mask_shape": list(mask.shape),
                    "min_class": int(mask.min()),
                    "max_class": int(mask.max()),
                    # "origin shape:": list(origins_shape[i])
                }
                result_json_str = json.dumps(mask_item)
                batch_results.append(result_json_str.encode('utf-8'))

            output_array = np.array(batch_results, dtype=object)
            out_tensor = pb_utils.Tensor("OUTPUT_RESULTS", output_array)
            response = pb_utils.InferenceResponse(output_tensors=[out_tensor])
            responses.append(response)
        return responses

(4)deeplabv3_ensemble (集成调度)

config.pbtxt
powershell 复制代码
name: "deeplabv3_ensemble"
platform: "ensemble"
max_batch_size: 128

input [
    {
        name: "RAW_IMAGE"
        data_type: TYPE_STRING
        dims: [ 1 ]
    }
]

output [
    {
        name: "OUTPUT_RESULTS"
        data_type: TYPE_STRING
        dims: [ 1 ]
    }
]

ensemble_scheduling {
    step [
        {
            model_name: "deeplabv3_preprocess"
            model_version: -1
            input_map {
                key: "RAW_IMAGE"
                value: "RAW_IMAGE"
            }
            output_map [
                {
                    key: "PREPROCESSED_IMAGE"
                    value: "preprocessed_image"
                },
                {
                    key: "IMAGE_SHAPE"
                    value: "IMAGE_SHAPE"
                }
            ]
        },
        {
            model_name: "deeplabv3_inference"
            model_version: -1
            input_map {
                key: "input"
                value: "preprocessed_image"
            }
            output_map {
                key: "output"
                value: "output"
            }
        },
        {
            model_name: "deeplabv3_postprocess"
            model_version: -1
            input_map [
                {
                    key: "MASKS"
                    value: "output"
                },
                {
                    key: "IMAGE_SHAPE"
                    value: "IMAGE_SHAPE"
                }
            ]
            output_map {
                key: "OUTPUT_RESULTS"
                value: "OUTPUT_RESULTS"
            }
        }
    ]
}

3、模型部署

powershell 复制代码
docker run -d \
  --gpus 1 \
  --name tritonserver \
  -p 127.0.0.1:8000:8000 \
  -v deeplabv3/models:/models \
  nvcr.io/nvidia/tritonserver:23.01-py3-v0.0.1 \
  CUDA_VISIBLE_DEVICES=0 tritonserver --model-repository=/models --strict-model-config=false --log-verbose=1 

注意:tritonserver镜像需要下载opencv依赖库

triton ensemble 部署 tensorrt 模型

1、onnx转tensorrt

powershell 复制代码
nerdctl run --gpus 1 -v $(pwd):/workspace -it nvcr.io/nvidia/tensorrt:23.01-py3 \
    bash -c \
    "cd /workspace && \
trtexec \
--onnx=deeplabv3_resnet50_coco.onnx \
--minShapes=input:1x3x512x512 \
--optShapes=input:64x3x512x512 \
--maxShapes=input:128x3x512x512 \
--workspace=8192 \
--saveEngine=deeplabv3_resnet50_coco_fp16.plan \
--explicitBatch \
--fp16"

2、项目代码

项目代码和部署onnx一致,只需要修改deeplabv3_inference部分:

  • 把tēnsorrt模型替换models/deeplabv3_inference/1/中的onnx模型,或者在文件夹1的同目录下创建文件2,把tensorrt模型放在文件2里面。
  • 修改models/deeplabv3_inference/config.pbtxt:
powershell 复制代码
name: "deeplabv3_inference"
backend: "tensorrt"  ## 把onnxruntime改成tensorrt
default_model_filename: "deeplabv3_resnet50_coco_fp16.plan"  # 修改为tensorrt的名称
max_batch_size: 128

dynamic_batching {
  max_queue_delay_microseconds: 100000
  preferred_batch_size: [ 4, 16, 32, 64, 128 ]
}

input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [3, 512, 512]
  }
]

output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [-1, -1, -1]
  },
  {
    name: "621"
    data_type: TYPE_FP32
    dims: [-1, -1, -1]
  }
]

instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [0]  # 使用第 0 号 GPU
  }
]

3、模型部署

和onnx部署一致。