目录
[P1 2D Detection and Segmentation编辑](#P1 2D Detection and Segmentation编辑)
[P2 Video = 2D + time series](#P2 Video = 2D + time series)
[P3 Focus on Two Problems](#P3 Focus on Two Problems)
[P4 Many more topics in 3D Vision](#P4 Many more topics in 3D Vision)
[P5-10 Multi-View CNN](#P5-10 Multi-View CNN)
[P11 Experiments -- Classification & Retrieval](#P11 Experiments – Classification & Retrieval)
[P12 3D Shape Representations](#P12 3D Shape Representations)
[P13--17 3D Shape Representations: Depth Map](#P13--17 3D Shape Representations: Depth Map)
[P18--26 3D Shape Representations: Surface Normals 曲面法线](#P18--26 3D Shape Representations: Surface Normals 曲面法线)
[P27--34 3D Shape Representations: Point Cloud](#P27--34 3D Shape Representations: Point Cloud)
[P35--66 3D Shape Representations: Triangle Mesh](#P35--66 3D Shape Representations: Triangle Mesh)
P1 2D Detection and Segmentationdata:image/s3,"s3://crabby-images/678fc/678fc1f9b6bbdb0cccfa6bd2c029ff25d26c3a36" alt=""
Classification分P类:没有空间信息,只是对一张图片进行分类
Semantic Segmentation语义分割: 没有物体,只有像素点,对像素点进行分类
Object Detection目标检测:直接识别出物体并进行分类
Instance Segmentation:实例分割=目标检测+语义分割 (第一次听说这个)
语义分割只需要分出不同类就行,同类的不同个体不需要分,但是Instance Segmentation在语义分割的基础上又把不同的类进行了分割:目标检测后,需要对检测的部分做进一步的语义分割
P2 Video = 2D + time series
视频就是2D的图像加上了时间序列
P3 Focus on Two Problems
今天需要解决的两个问题
①由一张输入图像得到一个3D模型
②识别3D模型进行类别判定
P4 Many more topics in 3D Vision
3D Representations 三维表示法
Computing Correspondences 计算对应关系
Multi-view stereo 多视角立体
Structure from Motion 运动结构
Simultaneous Localization and Mapping (SLAM) 同步定位和绘图
View Synthesis 视图合成
Differentiable Graphics 可变图形
3D Sensors 三维传感器
P5-10 Multi-View CNN
data:image/s3,"s3://crabby-images/68c8e/68c8e30e1372783c414045329c557003fe1b22d9" alt=""
CNN1:提取图像特征的卷积神经网络
CNN2:生成描述形状符的卷积神经网络
P11 Experiments -- Classification & Retrieval
data:image/s3,"s3://crabby-images/46a78/46a7855e694840a119ddd9713740996d073a4b08" alt=""
Q:MVCNN? SPH? LFD? 3D ShapeNets? FV?
P12 3D Shape Representations
data:image/s3,"s3://crabby-images/5d2ee/5d2ee2db5e31ac217420c89f731749c578b723e7" alt=""
Q: Voxel Grid? Pointcloud? Mesh? Surface?
A:下面详细讲啦
P13--17 3D Shape Representations: Depth Map
RGB image + Depth image = RGB-D Image (2.5D)
data:image/s3,"s3://crabby-images/f7439/f7439c733fa3393cd2af5b2c49e804306069046b" alt=""
Q:H是Height ? W是Width?
可以使用全卷积神经网络进行深度图预测,得到两个估计的深度图,然后还可以得到每像素Loss
Per-Pixel Loss(L2 Distance)
Q:L2 距离是什么?
Problem: Scale / Depth Ambiguity
Q:具体是什么意思以及怎么解决?
A: 意思大概是单目图像中信息有限
Predicting Depth Maps
data:image/s3,"s3://crabby-images/08297/08297ba69e184c770b8c2f70e81e525bec53e4a6" alt=""
Scale invariant 尺度不变性
P18--26 3D Shape Representations: Surface Normals 曲面法线
对于每个像素,表面法线给出一个向量,表示该像素的世界上的对象的法向向量
假设 RGB Image为 3 x H x W,那么法线图 Normals: 3 x H x W
Predicting Normals
data:image/s3,"s3://crabby-images/728b8/728b89993e85fca3d9bffa3221854e82be2a9e90" alt=""
3D Shape Representations: Voxels
• Represent a shape with a V x V x V grid of occupancies 网格表示形状
• Just like segmentation masks in Mask R-CNN, but in 3D! 分割掩码
• (+) Conceptually simple: just a 3D grid! 只是一个3D网格
• (-) Need high spatial resolution to capture fine structures 需要高空间分辨率捕捉精细结构
• (-) Scaling to high resolutions is nontrivial ! 缩放到高分辨率并不容易
Processing Voxel Inputs: 3D Convolution
data:image/s3,"s3://crabby-images/6b01c/6b01c9466b8c41b31653b18913d13b17167c18c4" alt=""
Generating Voxel Shapes: 3D Convolution
data:image/s3,"s3://crabby-images/8ed0f/8ed0f4c9dd86196c085928bc342b0a0934b491e7" alt=""
Voxel Problems: Memory Usage
Storing 1024(3次方) voxel grid takes 4GB of memory
Scaling Voxels: Oct-Trees 八叉树
Q: 没太看懂这个Oct-Trees
data:image/s3,"s3://crabby-images/a783c/a783c4b9d6c7f8d68d7ac7eb231e5a51354b0d43" alt=""
P27--34 3D Shape Representations: Point Cloud
• Represent shape as a set of P points in 3D space
• (+) Can represent fine structures without huge numbers of points
• ( ) Requires new architecture, losses, etc
• (-) Doesn't explicitly represent the surface of the shape: extracting a mesh for rendering or other applications requires post-processing
提取网格为渲染或其他应用提取网格需要进行后处理
Proessing Pointcloud Inputs: PointNet
data:image/s3,"s3://crabby-images/aa467/aa467e6dbfe2a5af6cbd60c414d0a71f33fbcf62" alt=""
MLP ?
Max-Pool?
Generating Pointcloud Outputs
data:image/s3,"s3://crabby-images/71e32/71e3211c1077daae16bf1df527a25f8c627b8a2b" alt=""
Predicting Point Clouds: Loss Function
data:image/s3,"s3://crabby-images/80b23/80b232eade619b4c791ce04c06361adb39caea83" alt=""
data:image/s3,"s3://crabby-images/7b32a/7b32ac24886c550afa8540f24858786fceee178d" alt=""
P35--66 3D Shape Representations: Triangle Mesh
data:image/s3,"s3://crabby-images/2de67/2de673a95f92223521770ab69f54d7da9b950aa4" alt=""
Predicting Meshes: Pixel2Mesh
data:image/s3,"s3://crabby-images/83c1f/83c1fb818e2c2569598dd995342a4a71d4481d97" alt=""
Idea #1: Iterative mesh refinement
Start from initial ellipsoid mesh Network predicts offsets for each vertex Repeat.
Predicting Triangle Meshes: Graph Convolution
data:image/s3,"s3://crabby-images/2c58b/2c58b7c75f66d6359274866c120c0503985883e0" alt=""
data:image/s3,"s3://crabby-images/422cf/422cfff54d0287e4d7aa99885fddb03b65765f46" alt=""
Problem: How to incorporate image features?
Predicting Triangle Meshes: Vertex-Aligned Features
data:image/s3,"s3://crabby-images/a39d1/a39d15199b4f9f18be4ce7c59e319e6118d2a9a6" alt=""
Predicting Meshes: Loss Function
The same shape can be represented with different meshes -- how can we define a loss between predicted and ground-truth mesh?
**Idea:**Convert meshes to pointclouds, then compute loss
data:image/s3,"s3://crabby-images/df4be/df4be3ec4d0bef3300aadae1ad7cad65d0d4e5f5" alt=""
3D Shape Prediction: Mesh R-CNN
Mesh R-CNN: Hybrid 3D shape representation
未完待续