cann-recipes-spatial-intelligence：空间智能训练推理方案

前言

空间智能（Spatial Intelligence）是让 AI 理解三维空间的能力：物体位置、空间关系、场景布局、运动轨迹。典型应用有自动驾驶（3D 目标检测、轨迹预测）、机器人导航（SLAM、避障）、AR/VR（场景重建、手势交互）。训练空间智能模型需要点云处理（PyTorch3D、Open3D）、多视角融合（Transformer + 3D Position Embedding）、BEV（Bird's Eye View）编码。cann-recipes-spatial-intelligence 是昇腾 CANN 的空间智能方案仓库，提供常见空间智能模型的完整训练推理脚本。

仓库定位

cann-recipes-spatial-intelligence 属于示例与学习资源仓库组，和 cann-recipes-infer、cann-recipes-train、cann-recipes-embodied-intelligence 同类。它的上游是 PyTorch NPU 插件和 ops-cv（视觉算子库），下游对接自动驾驶和机器人部署。

仓库目录结构：

复制代码

cann-recipes-spatial-intelligence/
+-- 3d_detection/          # 3D 目标检测
|   +-- pointpillars/      # PointPillars（点云 -> 伪图像 -> 2D 检测）
|   +-- centerpoint/        # CenterPoint（点云 -> 3D 检测）
|   +-- bevformer/         # BEVFormer（多视角 -> BEV -> Transformer）
+-- segmentation/            # 场景分割
|   +-- minkowski/        # Minkowski Engine（稀疏 3D 卷积）
|   +-- torchsparse/       # TorchSparse（稀疏张量加速）
+-- slam/                   # SLAM（同步定位与地图构建）
|   +-- orbslam3/         # ORB-SLAM3（特征点 SLAM）
|   +-- direct_visual/     # 直接法视觉 SLAM
+-- multi_modal/             # 多模态融合
    +-- bevfusion/         # BEVFusion（相机 + 激光雷达融合）
    +-- transfuser/        # TransFuser（多视角融合 Transformer）

代码实战：PointPillars 3D 检测训练

PointPillars 是自动驾驶中常用的 3D 目标检测算法：把点云转换成伪图像（Pillar 表示），再用 2D CNN 做检测。

python 复制代码

import torch
import torch_npu
from torch.utils.data import DataLoader
from models.pointpillars import PointPillars
from data.kitti import KITTIDataset
from train.trainer import Trainer

device = torch.device("npu")

1. 数据加载

train_dataset 复制代码

    data_root="data/kitti/",
    split="train",
    max_points=40000,   # 每张点云最多 40000 个点
    num_pillars=12000,   # 最多 12000 个 Pillar
    pillar_size=(0.16, 0.16, 4)  # Pillar 尺寸（x, y, z）
)
train_loader = DataLoader(
    train_dataset, batch_size=4, shuffle=True, num_workers=8
)

2. 模型定义

model 复制代码

    num_classes=3,       # 3 类：车、行人、自行车
    pillar_dim=9,         # Pillar 特征维度（x, y, z, ...）
    hidden_dim=64,
    num_pillars=12000,
    backbone="second"     # SECOND 骨干网络
).to(device).half()

3. 优化器

optimizer 复制代码

    model.parameters(), lr=1e-3, weight_decay=1e-4
)

4. 训练循环

trainer 复制代码

    model=model,
    train_loader=train_loader,
    optimizer=optimizer,
    device=device,
    max_epochs=80,
    save_interval=10,
    log_interval=50
)

trainer.train()

5. 验证（计算 mAP）

from 复制代码

map_scores = evaluate_map(
    model, "data/kitti/val", device=device
)
print(f"mAP@0.5: {map_scores['map_50']:.3f}")
print(f"mAP@0.7: {map_scores['map_70']:.3f}")

训练 80 个 epoch 大约需要 8 小时（8x Ascend 910，batch_size=4）。同样配置在 8x NVIDIA A100 上需要 11.5 小时。

推理：BEV 检测可视化

训练完成后，在 BEV（鸟瞰图）上可视化检测结果：

python 复制代码

import torch
import torch_npu
import numpy as np
import matplotlib.pyplot as plt
from models.pointpillars import PointPillars
from data.utils import load_point_cloud

device = torch.device("npu")

1. 加载模型

model 复制代码

model.load_state_dict(torch.load("checkpoints/pointpillars_ep79.pth"))
model = model.to(device).half()
model.eval()

2. 加载点云

points 复制代码

points: (N, 4), (x, y, z, intensity)

3. 推理

with 复制代码

    points_npu = torch.from_numpy(points).to(device).half()
    detections = model(points_npu.unsqueeze(0))
    # detections: List[Dict], 每个 Dict 有 'bbox'（BEV 坐标）、'score'、'cls'

4. 可视化（BEV 图）

bev_map 复制代码

画点云（BEV 投影）

for 复制代码

    x, y = int(pt[0] / 0.2 + 200), int(pt[1] / 0.2 + 200)  # 0.2m/像素
    if 0 <= x < 400 and 0 <= y < 400:
        bev_map[y, x] = 255

画检测框

for 复制代码

    x1, y1, x2, y2 = det['bbox']  # BEV 坐标
    score, cls = det['score'], det['cls']
    if score > 0.5:
        # 转成像素坐标
        px1, py1 = int(x1 / 0.2 + 200), int(y1 / 0.2 + 200)
        px2, py2 = int(x2 / 0.2 + 200), int(y2 / 0.2 + 200)
        cv2.rectangle(bev_map, (px1, py1), (px2, py2), 255, 2)
        cv2.putText(bev_map, f"{cls}:{score:.2f}", (px1, py1-5),
                     cv2.FONT_HERSHEY_SIMPLEX, 0.5, 255, 1)

plt.imshow(bev_map, cmap='gray')
plt.title("BEV Detection Result")
plt.savefig("bev_result.png")

性能数据

测试环境：Atlas 800T A2（8x Ascend 910），CANN 8.0。

模型	数据集	8x910 (FPS)	8xA100 (FPS)	加速比
PointPillars	KITTI	145	112	1.29x
CenterPoint	nuScenes	98	75	1.31x
BEVFormer	nuScenes	42	32	1.31x
BEVFusion	KITTI	68	52	1.31x

Ascend 910 在空间智能模型上的性能比 A100 快 29-31%，主要原因是 NPU 的稀疏计算能力更强（点云是稀疏数据）。

空间智能应用场景

自动驾驶：3D 目标检测（车、行人、自行车）、轨迹预测、场景理解。用 cann-recipes-spatial-intelligence 训练的 PointPillars 模型，在 KITTI 测试集上 mAP@0.7 达到 72.3%。

机器人导航：SLAM（同步定位与地图构建）、避障、路径规划。ORB-SLAM3 在 Ascend 910 上实时运行（30 FPS），满足机器人导航的实时性要求。

AR/VR：场景重建（NeRF、3D Gaussian Splatting）、手势交互、空间音频。用 NPU 加速 NeRF 训练，速度比 GPU 快 2.1 倍。

cann-recipes-spatial-intelligence 是昇腾 CANN 面向空间智能领域的一站式方案。从数据加载、模型训练到推理部署，所有脚本都是现成的。代码在 https://atomgit.com/cann/cann-recipes-spatial-intelligence