昇腾 AI 开发生产力工具：CANN CLI 的高级使用与自动化脚本编写

从命令行到 CI/CD：掌握 CANN CLI 核心命令、批量处理技巧与 DevOps 自动化实践

🧩 引言：为什么 CLI 是昇腾开发者的"瑞士军刀"？

在昇腾 NPU 开发生命周期中，开发者每天要面对大量重复性任务：

将 PyTorch 模型转换为 .om
为不同芯片（Ascend 310/910）生成多个版本
批量校准 100 个 ONNX 模型
验证量化后精度是否达标
在 CI 流水线中自动部署模型

图形界面（如 MindStudio）适合初学者，但效率低下、不可复现、难以集成。

华为 CANN（Compute Architecture for Neural Networks） 提供了强大的 命令行工具集（CLI） ，覆盖 模型转换、性能分析、设备管理、日志诊断 全流程。配合 Shell/Python 脚本，可构建 端到端自动化流水线，提升 5--10 倍开发效率。

本文将系统讲解 CANN CLI 的高级用法，并通过实战脚本展示如何将其融入 DevOps 工作流。

🏗️ 一、CANN CLI 工具全景图

CANN CLI 由多个独立工具组成，各司其职：
模型开发者
atc\n模型转
msprof\n性能分析
npu-smi\n设备监控
acl.json\nACL 配置
modelzoo\n模型仓库
.om 模型
性能报告
设备状态

✅ 核心工具说明：

atc：AI Model Compiler，支持 ONNX/PyTorch → OM

msprof：性能 Profiler，分析计算/通信瓶颈

npu-smi ：NPU 设备管理，类似 nvidia-smi

xw-cli（社区工具）：一键部署大模型（见 ops-nn）

⚙️ 二、核心工具详解：atc 高级用法

2.1 基础转换命令

bash 复制代码

atc \
  --model=resnet50.onnx \
  --framework=5 \          # 5=ONNX, 3=Caffe
  --output=resnet50 \
  --soc_version=Ascend910 \
  --input_shape="input:1,3,224,224"

2.2 高级参数实战

场景	参数	说明
动态 Shape	`--dynamic_dims="1,3,224,224;1,3,256,256"`	支持多尺寸推理
自定义算子	`--plugin_path=./swish_plugin.so`	注册插件
混合精度	`--precision_mode=allow_mix_precision`	自动跳过敏感层
INT4 量化	`--quant_type=INT4`	启用 4 比特量化
输出中间图	`--dump_graph=1`	生成 .json 计算图

💡 提示：所有参数可通过 atc --help 查看

🔍 三、性能分析利器：msprof 深度使用

3.1 启动性能采集

bash 复制代码

# 运行推理并采集数据
msprof --output=./profile_resnet \
       --model-execution \
       --aicpu \
       --runtime-api \
       python infer.py

3.2 关键分析维度

维度	命令	输出
Timeline	`msprof analyze --timeline ./profile`	时间线视图
通信分析	`msprof analyze --communication ./profile`	HCCL 瓶颈
算子耗时	`msprof analyze --op_statistic ./profile`	Top 耗时算子
内存分析	`msprof analyze --memory ./profile`	HBM 使用峰值

✅ 实战技巧 ：结合 --device_id=0 分析单卡

🖥️ 四、设备监控：npu-smi 实战

4.1 基础信息查询

bash 复制代码

# 查看所有 NPU 状态
npu-smi info

# 输出示例：
+-------------------+------------------+--------------------------------------------------+
| NPU ID            | Health           | Power(W) / Temp(℃)                               |
+===================+==================+==================================================+
| 0                 | OK               | 85W / 52℃                                        |
| 1                 | OK               | 82W / 50℃                                        |
+-------------------+------------------+--------------------------------------------------+

4.2 高级监控脚本

bash 复制代码

# 监控 GPU 利用率 >90% 的卡
while true; do
  npu-smi info -t usage -i 0 | grep "Utilization" | awk '{if($2>90) print "NPU 0 busy"}'
  sleep 1
done

🔑 关键字段：

--type usage：利用率

--type power：功耗

--id 0：指定 NPU ID

🤖 五、自动化脚本编写：批量模型转换

5.1 场景：批量转换 100 个 ONNX 模型

目录结构：

复制代码

models/
├── resnet50.onnx
├── mobilenetv3.onnx
└── ...

5.2 Python 批量脚本

python 复制代码

# batch_convert.py
import os
import subprocess
from pathlib import Path

MODEL_DIR = "./models"
OUTPUT_DIR = "./om_models"
SOC_VERSION = "Ascend910"

os.makedirs(OUTPUT_DIR, exist_ok=True)

for onnx_file in Path(MODEL_DIR).glob("*.onnx"):
    model_name = onnx_file.stem
    output_path = f"{OUTPUT_DIR}/{model_name}"
    
    cmd = [
        "atc",
        "--model", str(onnx_file),
        "--framework", "5",
        "--output", output_path,
        "--soc_version", SOC_VERSION,
        "--precision_mode", "allow_mix_precision"
    ]
    
    print(f"Converting {model_name}...")
    result = subprocess.run(cmd, capture_output=True, text=True)
    
    if result.returncode != 0:
        print(f"❌ Failed: {result.stderr}")
    else:
        print(f"✅ Success: {output_path}.om")

✅ 优势：失败模型自动记录，不影响整体流程

🔄 六、CI/CD 集成：GitLab CI 示例

将 CANN CLI 接入持续集成：

yaml 复制代码

# .gitlab-ci.yml
stages:
  - build
  - test
  - deploy

build-om-models:
  stage: build
  script:
    - cd scripts
    - python batch_convert.py
  artifacts:
    paths:
      - om_models/
    expire_in: 1 week

test-accuracy:
  stage: test
  script:
    - python validate_all.py --models om_models/
  dependencies:
    - build-om-models

deploy-to-edge:
  stage: deploy
  script:
    - scp -r om_models/ user@edge-device:/models/
  only:
    - main

🔒 安全建议：使用 CI 变量存储设备密码

🛠️ 七、高级技巧：智能错误处理与重试

7.1 自动修复常见错误

python 复制代码

# smart_convert.py
def convert_with_retry(model_path, max_retries=3):
    for attempt in range(max_retries):
        result = run_atc(model_path)
        
        if "Unsupported op" in result.stderr:
            # 自动注册缺失算子（调用 ops-nn）
            op_name = extract_op_name(result.stderr)
            generate_plugin(op_name)
            continue  # 重试
        
        elif "Quantization error" in result.stderr:
            # 切换为 FP16
            run_atc(model_path, precision="fp16")
            break
        
        elif result.returncode == 0:
            return True
    
    return False

✅ 集成 AIGC ：调用 ops-nn/aigc/generate_op.py 自动生成插件

📊 八、性能对比：手动 vs 自动化

任务	手动操作	自动化脚本
转换 10 个模型	2 小时	3 分钟
精度验证	易遗漏	全覆盖 + 报告
多芯片适配	容易出错	参数化配置
CI 集成	不可能	一键触发

💡 ROI（投资回报率）：编写脚本 1 小时，节省后续 100+ 小时

📈 九、DevOps 最佳实践：CANN CLI 工作流

是
否
是
否
Git Push
CI Trigger
拉取最新模型
批量 ATC 转换
MSProf 性能分析
性能达标?
精度验证
告警 + 日志归档
精度达标?
打包 .om
启动 QAT 修复
部署到边缘设备

✅ 关键组件：

配置文件驱动 ：config.yaml 定义 soc_version、precision 等

日志标准化：统一 JSON 格式，便于 ELK 分析

版本管理 ：.om 文件带 Git Commit ID

🔐 十、安全与权限管理

10.1 权限最小化

CI Runner 使用专用账号，仅授予 npu-smi 读权限
生产部署使用 SSH 密钥而非密码

10.2 敏感信息保护

bash 复制代码

# 使用环境变量
export ASCEND_SOC_VERSION=Ascend910
atc --soc_version=$ASCEND_SOC_VERSION ...

❌ 禁止：在脚本中硬编码芯片型号或路径

✅ 十一、常用脚本模板库

在 ops-nn/cann_cli_tools 中提供：

脚本	功能
`convert_all.sh`	批量 ONNX → OM
`profile_model.py`	自动 msprof + 生成报告
`monitor_npu.sh`	实时监控 NPU 利用率
`fix_quant_error.py`	量化失败自动降级
`deploy_edge.sh`	一键部署到 Atlas 设备

💡 使用示例：

bash 复制代码

git clone https://atomgit.com/cann/ops-nn.git
cd ops-nn/cann_cli_tools
./convert_all.sh --input ./models --output ./om --soc Ascend310

🌟 结语

CANN CLI 不仅是命令行工具，更是昇腾 AI 开发的自动化引擎 。通过合理组合 atc、msprof、npu-smi 等工具，并编写健壮的脚本，开发者可以：

消除重复劳动
保证流程一致性
加速模型迭代
无缝集成 DevOps

无论你是算法工程师、MLOps 工程师还是系统集成商，掌握 CANN CLI 都将极大提升你的生产力。

📚 立即获取 CLI 工具脚本库

CANN 开源组织 ：https://atomgit.com/cannops-nn

ops-nn 仓库地址 ：https://atomgit.com/cann/ops-nn

在 ops-nn/cann_cli_tools 目录中，你将找到：

批量转换脚本
性能分析自动化工具
设备监控模板
CI/CD 集成示例

让每一次模型部署，都高效、可靠、可追溯！