本文说明如何从 NuScenes 中的一张落盘文件出发,查找同一帧(same sample)下其它相机图片与点云文件。示例使用了仓库内的数据路径与真实记录,文章末尾附上完整脚本 tools/nuscenes_find_synced_samples.py 的原文,供直接复制运行。
关键结论(一句话)
在 NuScenes 中,"同一帧"由 sample_token 标识;要把落盘文件(samples/...)关联到同帧其它传感器,需通过 sample_data.json 找到对应的 sample_token,再基于 calibrated_sensor.json 和 sensor.json 映射出具体的 channel(如 CAM_FRONT / LIDAR_TOP / RADAR_FRONT)并获取各自的 filename。
1. 元数据表与字段(最小必要说明)
-
sample_data.json(单条传感器数据)重要字段:
token:sample_data 的唯一 id(相当于图片/点云的记录 id)filename:磁盘上的相对路径,例如samples/CAM_BACK/...jpgsample_token:所属帧(same frame / same sample)的 idcalibrated_sensor_token:用于追溯该记录属于哪个物理传感器
-
calibrated_sensor.json(标定表)从
calibrated_sensor_token可以得到sensor_token。 -
sensor.json(传感器表)从
sensor_token可以得到channel(例如CAM_BACK、LIDAR_TOP、RADAR_FRONT)。
注:官方 devkit 中
sample对象常带有data字段(channel -> sample_data_token),但部分数据分发中sample.json可能是"简化版"或不含data字段;在这种情况下,使用sample_data.sample_token聚合是等价并可靠的做法。
2. 步骤(从文件到同帧其它传感器)
假设你有一张落盘图片文件(相对路径):
samples/CAM_BACK/n008-...__1533151603537558.jpg
- 在
data/nuscenes/<version>/sample_data.json中查找filename == 'samples/CAM_BACK/xxx.jpg',得到对应的sample_data记录。 - 从该记录读取
sample_token(这是"帧 id")。 - 在
sample_data.json中筛选所有sample_token == <该 sample_token>的记录 ------ 这些就是同一帧的全部传感器采样。 - 对每条记录,取
calibrated_sensor_token去calibrated_sensor.json查sensor_token,再去sensor.json查channel,从而知道该记录属于哪个传感器(CAM/LIDAR/RADAR)。 - 输出每个 channel 对应的
filename。若需要可在磁盘上校验(dataroot / filename).exists()。
3. 示例(使用你提供的图片)
给定(相对)文件名:
samples/CAM_BACK/n008-2018-08-01-15-16-36-0400__CAM_BACK__1533151603537558.jpg
在本仓库数据(data/nuscenes/v1.0-mini)中,对应到的 sample_data 记录包含:
token = 1908fe7d...sample_token = 3e8750f3...
筛出同一 sample_token 后,得到(本示例)同帧文件如下:
3.1 同帧 6 相机(CAM_*)
CAM_BACK
samples/CAM_BACK/n008-2018-08-01-15-16-36-0400__CAM_BACK__1533151603537558.jpgCAM_BACK_LEFT
samples/CAM_BACK_LEFT/n008-2018-08-01-15-16-36-0400__CAM_BACK_LEFT__1533151603547405.jpgCAM_BACK_RIGHT
samples/CAM_BACK_RIGHT/n008-2018-08-01-15-16-36-0400__CAM_BACK_RIGHT__1533151603528113.jpgCAM_FRONT
samples/CAM_FRONT/n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151603512404.jpgCAM_FRONT_LEFT
samples/CAM_FRONT_LEFT/n008-2018-08-01-15-16-36-0400__CAM_FRONT_LEFT__1533151603504799.jpgCAM_FRONT_RIGHT
samples/CAM_FRONT_RIGHT/n008-2018-08-01-15-16-36-0400__CAM_FRONT_RIGHT__1533151603520482.jpg
3.2 同帧点云(LIDAR / RADAR)
LIDAR_TOP
samples/LIDAR_TOP/n008-2018-08-01-15-16-36-0400__LIDAR_TOP__1533151603547590.pcd.binRADAR_FRONT
samples/RADAR_FRONT/n008-2018-08-01-15-16-36-0400__RADAR_FRONT__1533151603555991.pcdRADAR_FRONT_LEFT
samples/RADAR_FRONT_LEFT/n008-2018-08-01-15-16-36-0400__RADAR_FRONT_LEFT__1533151603526348.pcdRADAR_FRONT_RIGHT
samples/RADAR_FRONT_RIGHT/n008-2018-08-01-15-16-36-0400__RADAR_FRONT_RIGHT__1533151603512881.pcdRADAR_BACK_LEFT
samples/RADAR_BACK_LEFT/n008-2018-08-01-15-16-36-0400__RADAR_BACK_LEFT__1533151603522238.pcdRADAR_BACK_RIGHT
samples/RADAR_BACK_RIGHT/n008-2018-08-01-15-16-36-0400__RADAR_BACK_RIGHT__1533151603576423.pcd
4. 额外说明(sweeps / samples)
samples/:通常是 key-frame(用于标注/评测),结构清晰,一证多用。sweeps/:密集采样的中间帧(非关键帧),常用于时序融合或增强。
无论 filename 在 samples/ 还是 sweeps/,只要能在 sample_data.json 中定位到记录,就可按相同方法查找同 sample_token 的其它数据。
5. 工具脚本(完整原文,可直接保存为 tools/nuscenes_find_synced_samples.py)
python
#!/usr/bin/env python3
"""Find synchronized NuScenes sensor files for a given sample file.
This repo's [sample.json](http://_vscodecontentref_/0) appears to be a reduced schema
(without the usual `sample['data']` mapping). So we recover the "same frame"
("same sample") relationship by joining tables:
- sample_data.filename -> sample_data.sample_token
- sample_data.calibrated_sensor_token -> calibrated_sensor.sensor_token
- sensor.channel -> the sensor name (e.g. CAM_FRONT, LIDAR_TOP, RADAR_FRONT)
Given an input filename like:
samples/CAM_BACK/xxx.jpg
we:
1) Look up the matching `sample_data` row by `filename`.
2) Grab its `sample_token`.
3) Collect *all* sample_data rows with that `sample_token`.
4) Map each row to a channel name via calibrated_sensor + sensor.
5) Print the filenames for all channels in that frame.
Works for images and point clouds as long as the file exists in sample_data.json.
Example:
python3 [nuscenes_find_synced_samples.py](http://_vscodecontentref_/1) \
--dataroot data/nuscenes --version v1.0-mini \
--filename samples/CAM_BACK/n008-2018-08-01-15-16-36-0400__CAM_BACK__1533151603537558.jpg
"""
from __future__ import annotations
import argparse
import json
from pathlib import Path
from typing import Dict, List, Optional, Tuple
def _load_json(path: Path):
with path.open("r", encoding="utf-8") as f:
return json.load(f)
def _build_channel_mapper(dataroot: Path, version: str):
base = dataroot / version
calib_rows = _load_json(base / "calibrated_sensor.json")
sensor_rows = _load_json(base / "sensor.json")
calib_by_token: Dict[str, dict] = {r["token"]: r for r in calib_rows}
sensor_by_token: Dict[str, dict] = {r["token"]: r for r in sensor_rows}
def channel_of(sample_data_row: dict) -> Optional[str]:
calib = calib_by_token.get(sample_data_row.get("calibrated_sensor_token"))
if not calib:
return None
s = sensor_by_token.get(calib.get("sensor_token")) if calib else None
if not s:
return None
return s.get("channel")
return channel_of
def find_sample_data_by_filename(sample_data_rows: List[dict], filename: str) -> Optional[dict]:
# Filenames in NuScenes tables are POSIX-like relative paths.
# We compare as-is; caller should provide relative `samples/...` path.
for r in sample_data_rows:
if r.get("filename") == filename:
return r
return None
def collect_same_sample(
sample_data_rows: List[dict],
sample_token: str,
) -> List[dict]:
return [r for r in sample_data_rows if r.get("sample_token") == sample_token]
def main(argv: Optional[List[str]] = None) -> int:
p = argparse.ArgumentParser(description="Find other NuScenes sensor files in the same frame.")
p.add_argument("--dataroot", type=Path, default=Path("data/nuscenes"), help="NuScenes root dir")
p.add_argument(
"--version",
type=str,
default="v1.0-mini",
choices=["v1.0-mini", "v1.0-trainval", "v1.0-test"],
help="NuScenes metadata version folder under dataroot",
)
p.add_argument(
"--filename",
type=str,
required=True,
help="Relative filename in sample_data.json, e.g. samples/CAM_BACK/xxx.jpg",
)
p.add_argument(
"--check-exists",
action="store_true",
help="Also check whether each output file exists under dataroot.",
)
p.add_argument(
"--only",
type=str,
default="",
help="Optional comma-separated channel prefixes to keep, e.g. 'CAM_,LIDAR_,RADAR_'",
)
args = p.parse_args(argv)
base = args.dataroot / args.version
sample_data_path = base / "sample_data.json"
if not sample_data_path.exists():
raise SystemExit(f"Missing: {sample_data_path}")
sample_data_rows = _load_json(sample_data_path)
target = find_sample_data_by_filename(sample_data_rows, args.filename)
if not target:
raise SystemExit(
"Could not find filename in sample_data.json. "
"Make sure you pass a relative path like 'samples/CAM_BACK/xxx.jpg'."
)
sample_token = target.get("sample_token")
sd_token = target.get("token")
channel_of = _build_channel_mapper(args.dataroot, args.version)
rows = collect_same_sample(sample_data_rows, sample_token)
# Build channel -> (filename, token)
channel_to_files: Dict[str, List[Tuple[str, str]]] = {}
for r in rows:
ch = channel_of(r)
if not ch:
continue
channel_to_files.setdefault(ch, []).append((r.get("filename"), r.get("token")))
prefixes: Tuple[str, ...] = tuple([x for x in (s.strip() for s in args.only.split(",")) if x])
def keep_channel(ch: str) -> bool:
if not prefixes:
return True
return any(ch.startswith(pref) for pref in prefixes)
print("Input:")
print(f" filename : {args.filename}")
print(f" sample_token : {sample_token}")
print(f" sample_data : {sd_token}")
print("")
print("Same-frame channels:")
for ch in sorted(channel_to_files.keys()):
if not keep_channel(ch):
continue
items = channel_to_files[ch]
# Normally one file per channel per sample, but we keep the list just in case.
for fn, tok in items:
line = f" {ch}: {fn} token={tok}"
if args.check_exists and fn:
exists = (args.dataroot / fn).exists()
line += f" exists={exists}"
print(line)
return 0
if __name__ == "__main__":
raise SystemExit(main())
6. 运行示例
python
python3 [nuscenes_find_synced_samples.py](http://_vscodecontentref_/2) \
--dataroot data/nuscenes --version v1.0-mini \
--filename samples/CAM_BACK/n008-2018-08-01-15-16-36-0400__CAM_BACK__1533151603537558.jpg \
--check-exists