NuScenes 同一帧多相机 / 点云的对应关系（示例 + 可运行脚本）

本文说明如何从 NuScenes 中的一张落盘文件出发，查找同一帧（same sample）下其它相机图片与点云文件。示例使用了仓库内的数据路径与真实记录，文章末尾附上完整脚本 tools/nuscenes_find_synced_samples.py 的原文，供直接复制运行。

关键结论（一句话）

在 NuScenes 中，"同一帧"由 sample_token 标识；要把落盘文件（samples/...）关联到同帧其它传感器，需通过 sample_data.json 找到对应的 sample_token，再基于 calibrated_sensor.json 和 sensor.json 映射出具体的 channel（如 CAM_FRONT / LIDAR_TOP / RADAR_FRONT）并获取各自的 filename。

1. 元数据表与字段（最小必要说明）

sample_data.json（单条传感器数据）

重要字段：
- token：sample_data 的唯一 id（相当于图片/点云的记录 id）
- filename：磁盘上的相对路径，例如 samples/CAM_BACK/...jpg
- sample_token：所属帧（same frame / same sample）的 id
- calibrated_sensor_token：用于追溯该记录属于哪个物理传感器
calibrated_sensor.json（标定表）

从 calibrated_sensor_token 可以得到 sensor_token。
sensor.json（传感器表）

从 sensor_token 可以得到 channel（例如 CAM_BACK、LIDAR_TOP、RADAR_FRONT）。

注：官方 devkit 中 sample 对象常带有 data 字段（channel -> sample_data_token），但部分数据分发中 sample.json 可能是"简化版"或不含 data 字段；在这种情况下，使用 sample_data.sample_token 聚合是等价并可靠的做法。

2. 步骤（从文件到同帧其它传感器）

假设你有一张落盘图片文件（相对路径）：
samples/CAM_BACK/n008-...__1533151603537558.jpg

在 data/nuscenes/<version>/sample_data.json 中查找 filename == 'samples/CAM_BACK/xxx.jpg'，得到对应的 sample_data 记录。
从该记录读取 sample_token（这是"帧 id"）。
在 sample_data.json 中筛选所有 sample_token == <该 sample_token> 的记录 ------ 这些就是同一帧的全部传感器采样。
对每条记录，取 calibrated_sensor_token 去 calibrated_sensor.json 查 sensor_token，再去 sensor.json 查 channel，从而知道该记录属于哪个传感器（CAM/LIDAR/RADAR）。
输出每个 channel 对应的 filename。若需要可在磁盘上校验 (dataroot / filename).exists()。

3. 示例（使用你提供的图片）

给定（相对）文件名：
samples/CAM_BACK/n008-2018-08-01-15-16-36-0400__CAM_BACK__1533151603537558.jpg

在本仓库数据（data/nuscenes/v1.0-mini）中，对应到的 sample_data 记录包含：

token = 1908fe7d...
sample_token = 3e8750f3...

筛出同一 sample_token 后，得到（本示例）同帧文件如下：

3.1 同帧 6 相机（CAM_*）

CAM_BACK
samples/CAM_BACK/n008-2018-08-01-15-16-36-0400__CAM_BACK__1533151603537558.jpg
CAM_BACK_LEFT
samples/CAM_BACK_LEFT/n008-2018-08-01-15-16-36-0400__CAM_BACK_LEFT__1533151603547405.jpg
CAM_BACK_RIGHT
samples/CAM_BACK_RIGHT/n008-2018-08-01-15-16-36-0400__CAM_BACK_RIGHT__1533151603528113.jpg
CAM_FRONT
samples/CAM_FRONT/n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151603512404.jpg
CAM_FRONT_LEFT
samples/CAM_FRONT_LEFT/n008-2018-08-01-15-16-36-0400__CAM_FRONT_LEFT__1533151603504799.jpg
CAM_FRONT_RIGHT
samples/CAM_FRONT_RIGHT/n008-2018-08-01-15-16-36-0400__CAM_FRONT_RIGHT__1533151603520482.jpg

3.2 同帧点云（LIDAR / RADAR）

LIDAR_TOP
samples/LIDAR_TOP/n008-2018-08-01-15-16-36-0400__LIDAR_TOP__1533151603547590.pcd.bin
RADAR_FRONT
samples/RADAR_FRONT/n008-2018-08-01-15-16-36-0400__RADAR_FRONT__1533151603555991.pcd
RADAR_FRONT_LEFT
samples/RADAR_FRONT_LEFT/n008-2018-08-01-15-16-36-0400__RADAR_FRONT_LEFT__1533151603526348.pcd
RADAR_FRONT_RIGHT
samples/RADAR_FRONT_RIGHT/n008-2018-08-01-15-16-36-0400__RADAR_FRONT_RIGHT__1533151603512881.pcd
RADAR_BACK_LEFT
samples/RADAR_BACK_LEFT/n008-2018-08-01-15-16-36-0400__RADAR_BACK_LEFT__1533151603522238.pcd
RADAR_BACK_RIGHT
samples/RADAR_BACK_RIGHT/n008-2018-08-01-15-16-36-0400__RADAR_BACK_RIGHT__1533151603576423.pcd

4. 额外说明（sweeps / samples）

samples/：通常是 key-frame（用于标注/评测），结构清晰，一证多用。
sweeps/：密集采样的中间帧（非关键帧），常用于时序融合或增强。

无论 filename 在 samples/ 还是 sweeps/，只要能在 sample_data.json 中定位到记录，就可按相同方法查找同 sample_token 的其它数据。

5. 工具脚本（完整原文，可直接保存为 `tools/nuscenes_find_synced_samples.py`）

python 复制代码

#!/usr/bin/env python3
"""Find synchronized NuScenes sensor files for a given sample file.

This repo's [sample.json](http://_vscodecontentref_/0) appears to be a reduced schema
(without the usual `sample['data']` mapping). So we recover the "same frame"
("same sample") relationship by joining tables:

- sample_data.filename -> sample_data.sample_token
- sample_data.calibrated_sensor_token -> calibrated_sensor.sensor_token
- sensor.channel -> the sensor name (e.g. CAM_FRONT, LIDAR_TOP, RADAR_FRONT)

Given an input filename like:
    samples/CAM_BACK/xxx.jpg
we:
1) Look up the matching `sample_data` row by `filename`.
2) Grab its `sample_token`.
3) Collect *all* sample_data rows with that `sample_token`.
4) Map each row to a channel name via calibrated_sensor + sensor.
5) Print the filenames for all channels in that frame.

Works for images and point clouds as long as the file exists in sample_data.json.

Example:
    python3 [nuscenes_find_synced_samples.py](http://_vscodecontentref_/1) \
      --dataroot data/nuscenes --version v1.0-mini \
      --filename samples/CAM_BACK/n008-2018-08-01-15-16-36-0400__CAM_BACK__1533151603537558.jpg
"""

from __future__ import annotations

import argparse
import json
from pathlib import Path
from typing import Dict, List, Optional, Tuple


def _load_json(path: Path):
    with path.open("r", encoding="utf-8") as f:
        return json.load(f)


def _build_channel_mapper(dataroot: Path, version: str):
    base = dataroot / version
    calib_rows = _load_json(base / "calibrated_sensor.json")
    sensor_rows = _load_json(base / "sensor.json")

    calib_by_token: Dict[str, dict] = {r["token"]: r for r in calib_rows}
    sensor_by_token: Dict[str, dict] = {r["token"]: r for r in sensor_rows}

    def channel_of(sample_data_row: dict) -> Optional[str]:
        calib = calib_by_token.get(sample_data_row.get("calibrated_sensor_token"))
        if not calib:
            return None
        s = sensor_by_token.get(calib.get("sensor_token")) if calib else None
        if not s:
            return None
        return s.get("channel")

    return channel_of


def find_sample_data_by_filename(sample_data_rows: List[dict], filename: str) -> Optional[dict]:
    # Filenames in NuScenes tables are POSIX-like relative paths.
    # We compare as-is; caller should provide relative `samples/...` path.
    for r in sample_data_rows:
        if r.get("filename") == filename:
            return r
    return None


def collect_same_sample(
    sample_data_rows: List[dict],
    sample_token: str,
) -> List[dict]:
    return [r for r in sample_data_rows if r.get("sample_token") == sample_token]


def main(argv: Optional[List[str]] = None) -> int:
    p = argparse.ArgumentParser(description="Find other NuScenes sensor files in the same frame.")
    p.add_argument("--dataroot", type=Path, default=Path("data/nuscenes"), help="NuScenes root dir")
    p.add_argument(
        "--version",
        type=str,
        default="v1.0-mini",
        choices=["v1.0-mini", "v1.0-trainval", "v1.0-test"],
        help="NuScenes metadata version folder under dataroot",
    )
    p.add_argument(
        "--filename",
        type=str,
        required=True,
        help="Relative filename in sample_data.json, e.g. samples/CAM_BACK/xxx.jpg",
    )
    p.add_argument(
        "--check-exists",
        action="store_true",
        help="Also check whether each output file exists under dataroot.",
    )
    p.add_argument(
        "--only",
        type=str,
        default="",
        help="Optional comma-separated channel prefixes to keep, e.g. 'CAM_,LIDAR_,RADAR_'",
    )

    args = p.parse_args(argv)

    base = args.dataroot / args.version
    sample_data_path = base / "sample_data.json"
    if not sample_data_path.exists():
        raise SystemExit(f"Missing: {sample_data_path}")

    sample_data_rows = _load_json(sample_data_path)

    target = find_sample_data_by_filename(sample_data_rows, args.filename)
    if not target:
        raise SystemExit(
            "Could not find filename in sample_data.json. "
            "Make sure you pass a relative path like 'samples/CAM_BACK/xxx.jpg'."
        )

    sample_token = target.get("sample_token")
    sd_token = target.get("token")

    channel_of = _build_channel_mapper(args.dataroot, args.version)

    rows = collect_same_sample(sample_data_rows, sample_token)

    # Build channel -> (filename, token)
    channel_to_files: Dict[str, List[Tuple[str, str]]] = {}
    for r in rows:
        ch = channel_of(r)
        if not ch:
            continue
        channel_to_files.setdefault(ch, []).append((r.get("filename"), r.get("token")))

    prefixes: Tuple[str, ...] = tuple([x for x in (s.strip() for s in args.only.split(",")) if x])

    def keep_channel(ch: str) -> bool:
        if not prefixes:
            return True
        return any(ch.startswith(pref) for pref in prefixes)

    print("Input:")
    print(f"  filename     : {args.filename}")
    print(f"  sample_token : {sample_token}")
    print(f"  sample_data  : {sd_token}")
    print("")
    print("Same-frame channels:")

    for ch in sorted(channel_to_files.keys()):
        if not keep_channel(ch):
            continue
        items = channel_to_files[ch]
        # Normally one file per channel per sample, but we keep the list just in case.
        for fn, tok in items:
            line = f"  {ch}: {fn}  token={tok}"
            if args.check_exists and fn:
                exists = (args.dataroot / fn).exists()
                line += f"  exists={exists}"
            print(line)

    return 0


if __name__ == "__main__":
    raise SystemExit(main())

6. 运行示例