文章目录
-
- KITTI
- SemanticKITTI
-
-
- [Semantic Segmentation and Panoptic Segmentation](#Semantic Segmentation and Panoptic Segmentation)
- [Semantic Scene Completion](#Semantic Scene Completion)
-
- 数据转换
- 语义标签和点云拼接
KITTI
The odometry benchmark consists of 22 stereo sequences, saved in loss less png format: We provide 11 sequences (00-10) with ground truth trajectories for training and 11 sequences (11-21) without ground truth for evaluation. For this benchmark you may provide results using monocular or stereo visual odometry, laser-based SLAM or algorithms that combine visual and LIDAR information. The only restriction we impose is that your method is fully automatic (e.g., no manual loop-closure tagging is allowed) and that the same parameter set is used for all sequences. A development kit provides details about the data format.
- Download odometry data set (grayscale, 22 GB)
- Download odometry data set (color, 65 GB)
- Download odometry data set (velodyne laser data, 80 GB)
- Download odometry data set (calibration files, 1 MB)
- Download odometry ground truth poses (4 MB)
- Download odometry development kit (1 MB)
官网说了只有前十一个序列是用于训练的,所以里程计真实位姿转换矩阵在 ground_truth_poses 中只有前十一个序列的位姿转换。下载数据需要进行邮箱注册,等待审核,不过一般很快的......
SemanticKITTI
SemanticKITTI 在基于 KITTI 的基础上对车载激光雷达获取的点云数据进行了进一步的标注,用于三维目标识别检测以及场景点云补全任务,SemanticKITTI 和 KITTI 一样提供了比赛打榜的服务,各家可以拿自己的模型去评测 SemanticKITTI 任务。SemanticKITTI 提供了标注数据以供下载,标注数据不仅包含了 KITTI 的训练集当然也包含了 KITTI 的测试集(也即 11-21 十一个序列),不过这一部分的数据由于缺少真实的转换位姿,所以不能合成比较大的场景了,但是应该没什么关系因为 SemanticKITTI 的初衷就是做三维目标检测和分割的,原来的 KITTI 更多的是 SLAM 的数据集,做光流预测以及位姿估计的,用于自动驾驶任务方面的测评,那么序列的真实转换位姿当然不能给出来,不然大家就直接拿到测试集的真实地图了(但万一有人实地去测呢?)
Semantic Segmentation and Panoptic Segmentation
We only provide the label files and the remaining files must be downloaded from the KITTI Vision Benchmark. In particular, the following steps are needed to get the complete data:
- Download KITTI Odometry Benchmark Velodyne point clouds (80 GB)
- Download KITTI Odometry Benchmark calibration data (1 MB)
- Download SemanticKITTI label data (179 MB)
- Extract everything into the same folder. The folder structure inside the zip files of our labels matches the folder structure of the original data.
Semantic Scene Completion
Note: On August 24, 2020, we updated the data according to an issue with the voxelizer. Ensure that you have version 1.1 of the data!
We provide the voxel grids for learning and inference, which you must download to get the SemanticKITTI voxel data (700 MB). This archive contains the training (all files) and test data (only bin files). Refer to the development kit to see how to read our binary files.
We additionally provide all extracted data for the training set, which can be download here (3.3 GB). This does not contain the test bin files.
数据转换
See also our development kit for further information on the labels and the reading of the labels using Python. The development kit also provides tools for visualizing the point clouds.
SemanticKITTI 提供的官方的转换脚本仓库,但是我是在服务器上做实验,用不到数据可视化的功能,只想把目前主流的各个室外场景数据集(SemanticKITTI、Waymo、NuScenes)转换成 numpy 容易读取的 npy 或者 npz 格式文件,这样方便训练,统一了格式。可以 git clone 把仓库克隆到本地看看实际上是怎么做的,然后复制核心代码自己写一个转换脚本把 22 个序列都转换成带标签的 npz 文件。
SemanticKITTI 的转换脚本转换之前,数据集必须按如下方式排列组织:
txt
/kitti/dataset/
└── sequences/
├── 00/
│ ├── poses.txt
│ ├── image_2/
│ ├── image_3/
│ ├── labels/
│ │ ├ 000000.label
│ │ └ 000001.label
| ├── voxels/
| | ├ 000000.bin
| | ├ 000000.label
| | ├ 000000.occluded
| | ├ 000000.invalid
| | ├ 000001.bin
| | ├ 000001.label
| | ├ 000001.occluded
| | ├ 000001.invalid
│ └── velodyne/
│ ├ 000000.bin
│ └ 000001.bin
├── 01/
├── 02/
.
.
.
└── 21/
因为我打算自己写一个转换的 notebook ,所以就没必要提前把数据组织成指定的格式了,反正只要知道怎么把 label 文件中的数据和 velodyne 点云数据对应起来就行。我自己的目录组织是每个压缩包解压出来的数据分门别类放到各自的目录下:
txt
┌[hm@hm 16:00:13] ~/datasets/SemanticKITTI
└> ls
total 7.1G
-rw-rw-r-- 1 hm hm 6.3G 5月 6 22:37 00.zip
drwxrwxr-x 3 hm hm 4.0K 6月 7 14:48 calib/
-rw-rw-r-- 1 hm hm 1.7K 5月 20 01:34 convert.ipynb
-rw-rw-r-- 1 hm hm 587K 5月 6 17:18 data_odometry_calib.zip
-rw-rw-r-- 1 hm hm 171M 5月 6 17:18 data_odometry_label.zip
-rw-rw-r-- 1 hm hm 1.3M 6月 7 15:59 data_odometry_poses.zip
-rw-rw-r-- 1 hm hm 662M 8月 24 2020 data_odometry_voxel.zip
drwxrwxr-x 3 hm hm 4.0K 6月 25 2019 label/
drwxrwxr-x 2 hm hm 4.0K 6月 7 16:00 numpy/
drwxrwxr-x 3 hm hm 4.0K 6月 6 21:14 point/
drwxrwxr-x 3 hm hm 4.0K 6月 7 15:59 poses/
drwxrwxr-x 3 hm hm 4.0K 6月 7 14:37 voxel/
语义标签和点云拼接
从 Semantic KITTI 开放的标签和 KITTI 开放的 velodyne 文件中提取点云以及语义标签然后通过 semantic_kitti.yaml
配置文件中的颜色标签映射做可视化。
这里关键的是不能直接使用 KITTI 对每个序列给出的 poses.txt
中的每一行的转移矩阵,因为这个矩阵是基于车载设备的左边第一个灰度相机的,而激光雷达的坐标系不是这个,需要根据 calib.txt
中给出的 Tr
行(左边的灰度相机到激光雷达的位置变换外参,前面四个 P1、P2、P3、P4 相机的内参,用于三维点云和二维像素的互相映射)
txt
P0: 7.188560000000e+02 0.000000000000e+00 6.071928000000e+02 0.000000000000e+00 0.000000000000e+00 7.188560000000e+02 1.852157000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P1: 7.188560000000e+02 0.000000000000e+00 6.071928000000e+02 -3.861448000000e+02 0.000000000000e+00 7.188560000000e+02 1.852157000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P2: 7.188560000000e+02 0.000000000000e+00 6.071928000000e+02 4.538225000000e+01 0.000000000000e+00 7.188560000000e+02 1.852157000000e+02 -1.130887000000e-01 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 3.779761000000e-03
P3: 7.188560000000e+02 0.000000000000e+00 6.071928000000e+02 -3.372877000000e+02 0.000000000000e+00 7.188560000000e+02 1.852157000000e+02 2.369057000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 4.915215000000e-03
Tr: 4.276802385584e-04 -9.999672484946e-01 -8.084491683471e-03 -1.198459927713e-02 -7.210626507497e-03 8.081198471645e-03 -9.999413164504e-01 -5.403984729748e-02 9.999738645903e-01 4.859485810390e-04 -7.206933692422e-03 -2.921968648686e-01
对于位姿文件的解析这里走了一些弯路,因为之前做过双目相机生成深度图然后再根据每一帧的相对位姿去拼接,这里想直接套用以前的老方法,发现拼接出来的结果很奇怪,之前我是这么解析 poses.txt
的每一行转成变换矩阵然后执行深度图点云的变换的:
python
# homogeneous 4x4 transformation matrix
# r11 r12 r13 tx
# r21 r22 r23 ty
# r31 r32 r33 tz
# 0 0 0 1
T = np.array([float(str_val) for str_val in gt_T.strip().split(' ')], dtype=np.float32).reshape((3, 4))
R = T[:3, :3]
t = T[:3, 3]
# raxis, angle_rad = resolve_axis_angle(R)
# angle_deg = angle_rad / np.pi * 180.0
# 从局部坐标系转换到相对一第一帧的世界坐标系,旋转+平移
points_list[:, :3] = points_list[:, :3] @ R.T + t
这里看了一下 SemanticKITTI 的 issues 看作者是怎么解释他们代码中用于提取每一帧点云的位姿变换矩阵的两个函数的:
python
def parse_calibration(filename):
""" read calibration file with given filename
Returns
-------
dict
Calibration matrices as 4x4 numpy arrays.
"""
calib = {}
calib_file = open(filename)
for line in calib_file:
key, content = line.strip().split(":")
values = [float(v) for v in content.strip().split()]
pose = np.zeros((4, 4))
pose[0, 0:4] = values[0:4]
pose[1, 0:4] = values[4:8]
pose[2, 0:4] = values[8:12]
pose[3, 3] = 1.0
calib[key] = pose
calib_file.close()
return calib
def parse_poses(filename, calibration):
""" read poses file with per-scan poses from given filename
Returns
-------
list
list of poses as 4x4 numpy arrays.
"""
file = open(filename)
poses = []
Tr = calibration["Tr"]
Tr_inv = np.linalg.inv(Tr)
for line in file:
values = [float(v) for v in line.strip().split()]
pose = np.zeros((4, 4))
pose[0, 0:4] = values[0:4]
pose[1, 0:4] = values[4:8]
pose[2, 0:4] = values[8:12]
pose[3, 3] = 1.0
poses.append(np.matmul(Tr_inv, np.matmul(pose, Tr)))
return poses
根据 issue 里作者提到的一些信息,也就是说 SemanticKITTI 的点云序列合成利用位姿的时候,还利用了 calib.txt
标定文件给出的几个相机的外参和内参,代码中用到了 Tr
外参把 poses.txt
中每一行代表的灰度相机相对于第一帧的变换矩阵转换成了基于激光雷达坐标系的相对变换矩阵。
那么现在就好办了,利用源代码给出的这两个转换函数生成正确的基于激光雷达的变换矩阵,然后再应用于每一帧点云就可以拼接了。
写一个 notebook 快速验证拼接的想法叭。首先需要的是从 .bin
和 .label
文件中解析出原始点云数据和语义标签数据:
python
def parse_point_label(seq_id: int):
lidars = np.fromfile(f"data/sequences/00/velodyne/{seq_id:06d}.bin", dtype=np.float32).reshape(-1, 4)
points = np.ones(lidars.shape)
points[:, :3] = lidars[:, :3]
labels = np.fromfile(f"data/sequences/00/labels/{seq_id:06d}.label", dtype=np.uint32).reshape(-1)
upper_half = labels >> 16 # get upper half for instances
lower_half = labels & 0xFFFF # get lower half for semantics
return points, lower_half
def convert_colors(labels):
colors = np.zeros((len(labels), 3))
for i in range(len(colors)):
colors[i] = np.array(palatte[labels[i]], dtype=np.float32)
colors /= 255.0
return colors
def transform(points: np.ndarray, pose: np.ndarray):
# points [n, 4]
# pose [4, 4]
return np.matmul(pose, points.T).T
这里解析点云和语义标签都是直接照搬 semantic-kitti-api 仓库里的代码的,这个仓库如果只是做数据提取的话只用看 generate_sequential.py
就行了,从里面找相关的代码截取出来写 notebook 验证。我自己写的代码中数据集是通过软链接连接到当前目录下的,所以路径是写死的,大家可以根据自己实际情况替换函数中的读取目录路径。
然后就是指定要拼接哪些帧的点云:
python
calib = parse_calibration("data/sequences/00/calib.txt")
poses = parse_poses("data/sequences/00/poses.txt", calibration=calib)
# scan_file_list = sorted(f for f in os.listdir("data/point/sequences/00/velodyne") if f.endswith(".bin"))
scan_file_list = [
0, 50, 100
]
concated_points = []
concated_colors = []
for scan_idx in scan_file_list:
points, labels = parse_point_label(scan_idx)
pose = poses[scan_idx]
points = transform(points, pose)[:, :3]
colors = convert_colors(labels)
# np.savez(
# osp.join(output_dir, osp.splitext(scan_file_list[scan_idx])[0]+".npz"),
# points=points,
# labels=labels,
# pose=pose
# )
concated_points.append(points)
concated_colors.append(colors)
points = np.concatenate(concated_points, axis=0)
colors = np.concatenate(concated_colors, axis=0)
ply = o3d.geometry.PointCloud()
ply.points = o3d.utility.Vector3dVector(points)
ply.colors = o3d.utility.Vector3dVector(colors)
o3d.io.write_point_cloud("output.ply", ply)
比如我这里以拼接 0、50、100 这三个序列帧为例,生成 .ply
文件可以在 MeshLab 中查看拼接可视化结果。