【论文阅读】DebSDF:深入研究神经室内场景重建的细节和偏差

【论文阅读】DebSDF:深入研究神经室内场景重建的细节和偏差

  • Abstract
  • 3.METHOD
    • [3.1 Preliminaries](#3.1 Preliminaries)
    • [3.2 Uncertainty Guided Prior Filtering](#3.2 Uncertainty Guided Prior Filtering)
    • [3.3 Uncertainty-Guided Ray Sampling](#3.3 Uncertainty-Guided Ray Sampling)
    • [3.4 Uncertainty-Guided Smooth Regularization](#3.4 Uncertainty-Guided Smooth Regularization)
    • [3.5 Bias-aware SDF to Density Transformation](#3.5 Bias-aware SDF to Density Transformation)
      • [3.5.1 Problem Formulation](#3.5.1 Problem Formulation)
      • [3.5.2 SDF to Density Mapping for Bias Reduction](#3.5.2 SDF to Density Mapping for Bias Reduction)
      • [3.5.3 Curvature Radius Estimation](#3.5.3 Curvature Radius Estimation)
      • [3.5.4 Progressive Warm-up](#3.5.4 Progressive Warm-up)
  • [4 EXPERIMENTS](#4 EXPERIMENTS)
    • [4.1 Implementation Details](#4.1 Implementation Details)
    • [4.2 Performance Comparison with Other Baselines](#4.2 Performance Comparison with Other Baselines)
    • [4.3 Performance Evaluation of Different Regions](#4.3 Performance Evaluation of Different Regions)
    • [4.4 Ablation Studies](#4.4 Ablation Studies)
  • [5 CONCLUSION](#5 CONCLUSION)

公式未做调整...

Abstract

In recent years, the neural implicit surface has emerged as a powerful representation for multi-view surface reconstruction due to its simplicity and state-of-the-art performance. However, reconstructing smooth and detailed surfaces in indoor scenes from multi-view images presents unique challenges. Indoor scenes typically contain large texture-less regions, making the photometric loss unreliable for optimizing the implicit surface. Previous work utilizes monocular geometry priors to improve the reconstruction in indoor scenes. However, monocular priors often contain substantial errors in thin structure regions due to domain gaps and the inherent inconsistencies when derived independently from different views. This paper presents DebSDF to address these challenges, focusing on the utilization of uncertainty in monocular priors and the bias in SDF-based volume rendering. We propose an uncertainty modeling technique that associates larger uncertainties with larger errors in the monocular priors. High-uncertainty priors are then excluded from optimization to prevent bias. This uncertainty measure also informs an importance-guided ray sampling and adaptive smoothness regularization, enhancing the learning of fine structures. We further introduce a bias-aware signed distance function to density transformation that takes into account the curvature and the angle between the view direction and the SDF normals to reconstruct fine details better. Our approach has been validated through extensive experiments on several challenging datasets, demonstrating improved qualitative and quantitative results in reconstructing thin structures in indoor scenes, thereby outperforming previous work. The source code and more visualizations can be found in https://davidxu-jj.github.io/pubs/DebSDF/.

近年来,神经隐式表面由于其简单性和最先进的性能而成为多视图表面​​重建的强大表示。然而,从多视图图像重建室内场景中光滑且详细的表面提出了独特的挑战。室内场景通常包含大的无纹理区域,使得光度损失对于优化隐式表面来说不可靠。之前的工作利用单目几何先验来改善室内场景的重建。然而,由于域间隙和从不同视图独立导出时固有的不一致,单目先验通常在薄结构区域中包含重大错误。本文提出 DebSDF 来应对这些挑战,重点关注单目先验中不确定性的利用以及基于 SDF 的体积渲染中的偏差。我们提出了一种不确定性建模技术,将较大的不确定性与单目先验中较大的误差联系起来。然后将高不确定性先验排除在优化之外以防止偏差。这种不确定性测量还为重要性引导的射线采样和自适应平滑正则化提供信息,从而增强了精细结构的学习。我们进一步向密度变换引入了偏差感知的有符号距离函数,该函数考虑了视图方向与 SDF 法线之间的曲率和角度,以更好地重建精细细节 。我们的方法已经通过对几个具有挑战性的数据集的广泛实验得到验证,证明了在室内场景中重建薄结构的定性和定量结果的改进,从而优于以前的工作。源代码和更多可视化可以在 https://davidxu-jj.github.io/pubs/DebSDF/中找到。


图 3:我们的方法的概述。我们提出了掩蔽不确定性学习来自适应地过滤几何先验,并在 5D 空间中定位详细和薄的区域,这样小而薄的结构就不会因为错误的先验而丢失。然后,利用局部不确定性图来指导射线采样和平滑正则化,以改善几何体的重建细节。此外,我们还分析了从SDF到密度变换所引起的体绘制偏差,这对几何先验的小而薄的结构产生了显着的负面影响。因此,针对有符号距离场提出了一种偏差感知的 SDF 到密度变换,以显着减少重建室内场景中小而薄的物体的偏差。

3.METHOD

Previous works [1], [9], [10] have shown that regularizing the optimization with geometry priors can significantly improve the reconstruction quality at texture-less areas such as the wall and floor within the implicit neural surface representation framework and SDF-based volume rendering. But it is still difficult to reconstruct the complex and detailed surface, especially when it is less observed in the indoor scene, such as the legs of the chair. We analyze that there are some reasons for this problem: (i) The obtained geometry priors have significantly larger errors in these regions than in other planar regions. (ii) Areas of fine detail occupy a small area in the indoor scene, so the unbalanced sampling harms the reconstruction quality. (iii) Applying the smooth regularization indiscriminately degrades the reconstruction of high-frequency signals. (iv) The SDF-based volume rendering has geometry bias resulting from the curvature of SDF, which leads to the elimination of fine and detailed thin geometry structures, especially with regularization from monocular geometry prior. To tackle these problems, we propose DebSDF, which filters the inaccurate monocular priors and uses a bias-aware transformation from SDF to density to reduce the ambiguity of density representation such that the elimination of the fine structure problem can be solved.

之前的工作 [1]、[9]、[10] 表明,使用几何先验对优化进行正则化可以显着提高隐式神经表面表示框架和基于 SDF 中无纹理区域(例如墙壁和地板)的重建质量体积渲染。但重建复杂而细致的表面仍然很困难,特别是在室内场景中观察较少的情况下,例如椅子的腿。我们分析造成这个问题的原因有:(i)获得的几何先验在这些区域中比其他平面区域具有明显更大的误差。 (ii) 精细细节区域在室内场景中占据的面积较小,因此不平衡采样会损害重建质量。 (iii)不加区别地应用平滑正则化会降低高频信号的重构。 (iv)基于SDF的体绘制具有由SDF的曲率导致的几何偏差,这导致消除精细和详细的薄几何结构,特别是在单目几何先验的正则化的情况下。为了解决这些问题,我们提出了 DebSDF,它过滤了不准确的单目先验,并使用从 SDF 到密度的偏差感知变换来减少密度表示的模糊性,从而消除精细结构问题。

3.1 Preliminaries

Following previous works [1], [2], we apply the implicit neural network to represent the geometry and radiance field and optimize this by differentiable volume rendering. Suppose a ray r is cast from the camera location o and passes through the pixel along the ray direction v. N points are sampled on the ray and the i-th point is defined as r(ti) = o + tiv where the ti is the distance to camera. The SDF si and color value ci corresponding to the r(ti) is predicted by the implicit network.

继之前的工作[1]、[2]之后,我们应用隐式神经网络来表示几何和辐射场,并通过可微分体渲染对其进行优化。假设光线 r 从相机位置 o 投射,并沿光线方向 v 穿过像素。在光线上采样 N 个点,第 i 个点定义为 r(ti) = o + tiv,其中 ti 为到相机的距离。 r(ti)对应的SDF si和颜色值ci由隐式网络预测。

To apply the volume rendering technique, the transformation from SDF s to density can be defined as the Laplace CDF [2]:

为了应用体绘制技术,从 SDF 到密度的变换可以定义为Laplace CDF(cumulative distribution function 累积分布函数) [2]:

or the Logisitc CDF [43], which proved with less bias than the former:

或 Logistic CDF [43],其偏差比前者小:


where the β is a learnable parameter. The rendered color of the ray r in the space is computed as [6]:

其中 β 是一个可学习的参数。空间中光线 r 的渲染颜色计算如下[6]:

where Ti and αi denote the transparency and alpha value at the ith point on the ray r, respectively [6]:

其中Ti和αi分别表示射线r上第i个点的透明度和alpha值[6]:

The rendered depth ˆ D® and normal ˆ N ® corresponding to the surface intersecting the ray r can be computed as [1], [36], [48]:

与射线 r 相交的表面对应的渲染深度 ˆ D® 和法线 ˆ N ® 可以计算为 [1], [36], [48]:

where the ni is the SDF gradient of i-th point on ray r.
The depth estimation network predicts the depth only up to scale, so the scale w and the shift q computed by the least-squares method [53] are applied to normalize the depth prior, which is denoted as D® = wD′® + q. The monocular depth and normal loss function for regularization of previous works [1], [10] are:

其中 ni 是射线 r 上第 i 个点的 SDF 梯度。

深度估计网络仅在尺度范围内预测深度,因此应用最小二乘法[53]计算的尺度w和偏移q来对深度先验进行归一化,表示为D® = wD′( r) + q。之前的作品[1]、[10]正则化的单目深度和法线损失函数为:

where the D( r) and N( r) are the depth and normal prior obtained from the pre-trained Omnidata model [49].

其中 D( r) 和 N( r) 是从预训练的 Omnidata 模型 [49] 获得的深度和法线先验。

3.2 Uncertainty Guided Prior Filtering

Since the monocular priors provided by the pre-trained models, such as Omnidata [49] and SNU [54], are not perfectly accurate, it is necessary to apply an adaptive strategy to filter the monocular geometry prior. NeuRIS [10], filters the monocular prior based on the assumption that the regions where the monocular priors are not faithful typically consist of high-frequency features or irregular shapes with relatively rich visual features in the input images. However, this assumption does not generalize well since the monocular prior could be faithful at the simple planar regions, and some planar surfaces also have high-frequency appearance features, such as the wall with lots of texture details. Instead of applying the image feature for monocular prior filtering, we utilize the prior uncertainty from multi-view to filter the faithful prior. Specifically, the monocular prior from a viewpoint is considered to be inaccurate if it has a large variance from other viewpoints. Besides, this also cannot guarantee the occlusion-aware property since whether a point on the ray is visible to other views is unknown. Based on such observation that the inaccurate priors usually have a large variance from multiple viewpoints, we introduce the masked uncertainty learning loss function to model this variance.

由于预训练模型(例如 Omnidata [49] 和 SNU [54])提供的单目先验并不完全准确,因此有必要应用自适应策略来过滤单目几何先验。 NeuRIS [10] 基于这样的假设来过滤单目先验:单目先验不忠实的区域通常由输入图像中具有相对丰富的视觉特征的高频特征或不规则形状组成。然而,这种假设并不能很好地推广,因为单目先验在简单的平面区域可能是忠实的,并且一些平面也具有高频外观特征,例如具有大量纹理细节的墙壁。我们没有将图像特征应用于单目先验过滤,而是利用多视图的先验不确定性来过滤可靠的先验。具体来说,如果某个视点的单目先验与其他视点的方差较大,则认为该先验是不准确的。此外,这也不能保证遮挡感知属性,因为光线上的点是否对其他视图可见是未知的。基于这样的观察,不准确的先验通常从多个角度来看具有很大的方差,我们引入了遮挡不确定性学习损失函数来对这种方差进行建模。

Following the [1], [2], the SDF values are predicted by an coordinate-based implicit network fg:

根据[1]、[2],SDF值由基于坐标的隐式网络fg预测:

where the z is the feature vector and the x is the point coordinate on the ray r. To model the prior uncertainty of a pixel, a straightforward approach is to apply the variance of geometry prior from different viewpoints at the same point on the surface as the prior uncertainty. However, this approach models the uncertainty view-independent, which is the main drawback since only the viewpoints with faithful priors need to be filtered. Besides, it is still unknown whether the points on a queried ray are visible to the other views due to occlusion. The uncertainty computed by the pre-trained depth or normal estimation network [48], [54] has low accuracy because of the domain gap. Due to this reason, we model the uncertainty of the prior as a view-dependent representation by the volume rendering [55], [56].

其中 z 是特征向量,x 是射线 r 上的点坐标。

为了对像素的先验不确定性进行建模,一种直接的方法是在表面上同一点处从不同视点应用先验几何方差作为先验不确定性。然而,这种方法(无法?)对与视图无关的不确定性进行建模,这是主要缺点,因为只有具有忠实先验的视图才需要被过滤。此外,由于遮挡,查询射线上的点是否对其他视图可见仍然未知。由于域间隙,由预训练深度或法线估计网络[48]、[54]计算的不确定性精度较低。由于这个原因,我们通过体积渲染将先验的不确定性建模为依赖于视图的表示[55],[56]。

Suppose the uncertainty scores at each point for modeling the variance corresponding to the depth and normal prior are ud ∈ R and un ∈ R3, respectively. We apply the view-dependent color network to predict the uncertainty scores:

假设用于对与深度和正常先验相对应的方差进行建模的每个点的不确定性分数分别为 ud ∈ R 和 un ∈ R3。我们应用依赖于视图的颜色网络来预测不确定性分数:

我们根据体积渲染计算与射线 r 对应的像素的深度不确定性得分 ˆ Ud® 和法向不确定性向量 ˆ Un®:


We compute the normal uncertainty score ˆ Un® as the mean value of the normal uncertainty vector ˆ Un( r) with 3 dimensions.

我们将法向不确定性得分 ˆ Un( r) 计算为 3 维正态不确定性向量 ˆ Un® 的平均值。

Moreover, we design the loss function based on the uncertainty learning perception with the mask to optimize the implicitly represented uncertainty field. The mask is applied to filter the negative impact from monocular prior with large uncertainty, which indicates a large probability of being inaccurate.

此外,我们基于带掩模的不确定性学习感知设计了损失函数,以优化隐式表示的不确定性场。该掩模用于过滤具有很大的不确定性单目先验的负面影响。

Masked Depth loss.

The D′( r) is the predicted geometry prior. We design the loss function by assuming the Laplace distribution of depth information [57].

深度掩码损失。 *

D′( r) 是先验预测的几何形状。我们通过假设深度信息的拉普拉斯分布来设计损失函数[57]。


where Ω(U, τ ) ⊙ F is an adaptive gradient detach operation. The gradients from F would be detached if U > τ while the gradients are not detached if U ≤ τ . This indicates that the geometry prior is not utilized in the regions with high uncertainty while the uncertainty score will still be optimized.

其中 Ω(U, τ ) ⊙ F 是自适应梯度分离操作。如果 U > τ ,则 F 的梯度将被分离,而如果 U ≤ τ ,则梯度不会分离。这表明在具有高不确定性的区域中没有利用几何先验,而不确定性得分仍将被优化。

Masked Normal loss.

Similar to the depth loss, the normal prior is transformed into the world coordinate space for regularization and computing the uncertainty.

法线掩码损失。与深度损失类似,法线先验被转换到世界坐标空间中以进行正则化和计算不确定性。

Specifically, the inconsistency can be considered as the adaptive weight to adjust the monocular prior to each pixel. Low weight is applied to the prior with a large inconsistency.

Color reconstruction loss.

Optimize the scene representation though the observation in 2D space.

具体来说,可以将不一致视为先于每个像素调整单目的自适应权重。低权重应用于存在较大不一致的先验。

颜色重建损失。

通过二维空间中的观察来优化场景表示。


Eikonal loss.

Following the [58], the Eikonal loss is applied for regularization such that the property of SDF can be satisfied.

遵循[58],应用Eikonal损失进行正则化,以满足SDF的性质。

where the X is the set of points uniformly sampled in the 3D space and the regions near the surface.

其中 X 是在 3D 空间和表面附近区域中均匀采样的点集。

3.3 Uncertainty-Guided Ray Sampling

Though applying the geometry prior with the uncertaintybased filter can improve the 3D reconstruction quality since inconsistency prior for some pixels can be filtered out, the fine and detailed structures are still hard to reconstruct. We observe that the fine and small object structures only occupy a small area in the image, so the probability of being sampled is low. In contrast, the texture-less and planar regions occupy most of the room's area and can already be reconstructed with high fidelity by a small number of ray samples, which benefits from the geometry prior supervision. Sampling more rays on these simple geometry surfaces than complex and fine geometry causes computational waste.

尽管应用几何先验和基于不确定性的滤波器可以提高 3D 重建质量,因为可以滤除某些像素的先验不一致,但精细和详细的结构仍然难以重建。我们观察到,细小的物体结构只占据图像中很小的区域,因此被采样的概率很低。相比之下,无纹理和平面区域占据了房间的大部分区域,并且已经可以通过少量的射线样本以高保真度重建,这得益于几何先验监督。在这些简单的几何表面上采样比在复杂和精细的几何表面上采样更多的光线会导致计算浪费。

Localizing the fine and detailed geometry surface, which usually corresponds to the high-frequency surface, is perceptual. A straight-forward approach is to apply a high pass filter or keypoint extractor to localize the location of high-frequency signal for sampling, which generates incorrect results at the planar surfaces with high-frequency color appearance. Even with the auxiliary prior information such as the geometry prior [49], directly utilizing the highfrequency part from geometry prior is still inaccurate since the geometry prior would miss some detailed structure which indicates predicting these regions as the smooth and planar surfaces. This can be observed in Fig. 1 where the chair legs are lost.

定位精细且详细的几何表面(通常对应于高频表面)是感知的。一种直接的方法是应用高通滤波器或关键点提取器来定位高频信号的位置进行采样,这会在具有高频颜色外观的平面上产生不正确的结果。即使有辅助先验信息,例如几何先验[49],直接利用几何先验的高频部分仍然不准确,因为几何先验会错过一些详细结构,这些结构表明将这些区域预测为光滑且平坦的表面。这可以在图 1 中观察到,其中椅子腿丢失了。

Based on this analysis, we compute the blend uncertainty score A® of each pixel by combining both inconsistency representations for depth and normal:

基于此分析,我们通过结合深度和法线的不一致表示来计算每个像素的混合不确定性得分 A®:


where the λ is a hyper-parameter. The blend uncertainty score is utilized as the guidance for the ray sampling:

其中 λ 是一个超参数。混合不确定性分数用作射线采样的指导:

Compared with inferring the confident maps for monocular prior filtering and ray sampling by only detecting the high-frequency image features, our method infers the uncertainty maps by using the information from multiple viewpoints implicitly. For each ray r, the probability to be sampled is calculated as:

与仅通过检测高频图像特征来推断单目先验滤波和射线采样的置信图相比,我们的方法通过隐式地使用来自多个视点的信息来推断不确定性图。对于每条射线 r,采样的概率计算如下:

3.4 Uncertainty-Guided Smooth Regularization

To avoid the reconstructed surfaces being too noisy, smooth regularization [7] is widely applied to reduce the floater on the surfaces. The smooth loss [7] requires the gradients of SDF should be the same in a local region, which can reduce the floaters near the surface. However, this regularization would also damage the fine details, which indicates that not all the surfaces in the indoor scene need to be smooth. According to this analysis, we utilize the smooth regularization term in an adaptive manner, which can not only keep the simple surfaces to be smooth but also preserve the fine details of complex geometry. For each sampled ray r, the S® denotes the points sampled near the surface along this ray. The smoothness loss term is:

为了避免重建表面噪声太大,广泛应用平滑正则化[7]来减少表面上的浮动。平滑损失[7]要求SDF的梯度在局部区域应该相同,这可以减少表面附近的漂浮物。然而,这种正则化也会损害精细细节 ,这表明并非室内场景中的所有表面都需要光滑。根据此分析,我们以自适应方式利用平滑正则化项,不仅可以保持简单曲面的平滑,而且可以保留复杂几何形状的精细细节。对于每条采样射线 r,S ( r) 表示沿该射线在表面附近采样的点。平滑度损失项为:

where the ε is a random offset sampled on a Gaussian distribution N (0, ξ) whose ξ is a small variance. The M(A(r, τs)) is a mask function:

其中 ε 是在高斯分布 N (0, ϋ) 上采样的随机偏移,其中 ϋ 是一个小方差。 M(A(r, τs)) 是一个掩模函数:

Apart from the points sampled on the rays, a small batch of points is uniformly randomly sampled in the indoor space, and the aforementioned smoothness regularization is also applied. Since the probability to sampled near the surface is very low, the uncertainty-based adaptive manner is not used. Besides, we do not apply the adaptive manner on the Eikonal loss since the SDF around the small and thin structure still satisfies this property.

除了射线上采样的点外,还在室内空间中均匀随机采样一小批点,并且还应用了前面提到的平滑正则化。由于在地表附近采样的概率非常低,因此不使用基于不确定性的自适应方式。此外,我们没有对 Eikonal 损失应用自适应方式,因为小而薄的结构周围的 SDF 仍然满足这个属性。

3.5 Bias-aware SDF to Density Transformation

Previous works [2], [4], [5] utilize a transformation from SDF to volumetric density and volume rendering technique for modeling the connection between the 3D geometry and the rendered images. All these methods are designed to model a weight function that is required to be unbiased. However, these methods, such as VolSDF [2] and NeuS [4], still suffer from biased volume rendering. The TUVR [39] analyzes the problem of these methods and proposes to scale the SDF with the angle between the normal and the ray direction, which can reduce the bias, but still ignore the bias caused by the curvature of SDF. Based on this, we think that the curvature of SDF should be concerned for unbiased volume rendering. In this section, we analyze the weight function from volume rendering near the "small object" and design a bias-aware transformation from SDF to density to improve the reconstruction quality of the small and fine structures.

之前的作品 [2]、[4]、[5] 利用从 SDF 到体积密度和体积渲染技术的转换来对 3D 几何和渲染图像之间的连接进行建模。所有这些方法都旨在对需要无偏的权重函数进行建模。然而,这些方法,例如 VolSDF [2] 和 NeuS [4],仍然受到体积渲染偏差的影响。 TUVR[39]分析了这些方法的问题,提出用法线与射线方向之间的角度来缩放SDF,这样可以减少偏差,但仍然忽略了SDF曲率造成的偏差。基于此,我们认为 SDF 的曲率应该关注无偏体绘制。在本节中,我们分析了"小物体"附近体积渲染的权重函数,并设计了从 SDF 到密度的偏差感知变换,以提高小而精细结构的重建质量。

3.5.1 Problem Formulation

To illustrate the limitation of transformation from SDF representation to volumetric density, we design 2 simple toy cases in Fig. 4. There is a small object A and a plane B in space. A ray emits on the plane B without intersecting with object A but being close to the object A or directly passes through the object A with an angle. In sub-figure (a), though the ray does not intersect with object A, the weight functions along the ray of VolSDF [2] and TUVR [39] are still influenced by object A. Multiple peaks can be observed on the weight function when near object A. In sub-figure (b), it can be observed that the previous works have large depth errors than our method. Especially, the case shown in sub-figure (a) leads to significant ambiguity for the regularization of monocular prior. Taking depth as an example, the rendered depth has a large bias to the ground truth depth. Because of this, the optimization of the masked depth loss, which minimizes the error between depth prior and rendered depth, would lead to the reduction of the density value around the peaks near the object A. And the masked normal loss suffers from the same problem as depth. Though the points sampled around the multiple peaks along the ray are not inside the object A, it would still influence the SDF representation of object A due to the local contiguity of the implicit neural network and the regularization of Eikonal loss. This ambiguity causes the inefficient optimization of SDF for the fine and small objects in the indoor scene. Specifically, this problem shrinks the small and fine structures to be smaller or even disappear.

为了说明从 SDF 表示到体积密度转换的局限性,我们在图 4 中设计了 2 个简单的玩具案例。空间中有一个小物体 A 和一个平面 B。一条光线在平面B上发射,不与物体A相交,而是靠近物体A,或者以一定角度直接穿过物体A。在子图(a)中,虽然射线不与物体A相交,但VolSDF[2]和TUVR[39]沿射线的权重函数仍然受到物体A的影响。在权重函数上可以观察到多个峰值当靠近物体A时。在子图(b)中,可以观察到之前的工作比我们的方法具有更大的深度误差。特别是,子图(a)中所示的情况导致单目先验的正则化显着模糊。以深度为例,渲染的深度与groundtruth深度有较大偏差。因此,掩蔽深度损失的优化(最大限度地减少先验深度和渲染深度之间的误差)将导致对象 A 附近峰值周围的密度值减小。而掩蔽法线损失也遇到同样的问题作为深度。尽管沿着射线的多个峰值周围采样的点不在对象A内部,但由于隐式神经网络的局部连续性和Eikonal损失的正则化,它仍然会影响对象A的SDF表示。这种模糊性导致SDF对于室内场景中的细小物体的优化效率低下。具体来说,这个问题使微小的结构变得更小,甚至消失。

3.5.2 SDF to Density Mapping for Bias Reduction

The aforementioned problem is caused by biased volume rendering. Previous work, such as VolSDF [2], NeuS [4], and Unisurf [5] design transformation from SDF to density field, which cannot guarantee the rendered depth point is at the zero level set of SDF even if the ray intersects vertically with the surface. While the NeuS [4] applies the cosine of the angle between the normal vector and the ray direction to eliminate the bias based on analyzing the ray intersecting with the planar surface. After this, the [39] also utilizes the cosine of this angle to decrease the bias further. However, these methods ignore to consider the influence of the curvature of SDF, which can be useful for unbiased volume rendering. To tackle the aforementioned problem, we design a biasaware transformation from SDF to density based on the analysis of the SDF curvature and the extreme point of weight function. Specifically, we consider a simple scenario: a ray r intersects a circle of radius a at point L. As shown in Fig. 5, the SDF of point r(t) is s, but the ray reaches the circle surface through a distance of y. Our target is to design an SDF mapping function to replace the s(t) in σ(s(t)) (Eq. 1) as y(t). We propose a function to map the SDF value s to the distance y in the transformation to density, which aims to reduce the negative influence from the curvature of SDF and the ray direction. The a is the curvature radius of the point D on the surface of the object A

上述问题是由体绘制偏差引起的。之前的工作,如VolSDF [2]、NeuS [4]和Unisurf [5]设计了从SDF到密度场的转换,即使射线与垂直相交也不能保证渲染的深度点位于SDF的零水平集。表面。而NeuS[4]则在分析与平面相交的射线的基础上,应用法线矢量与射线方向夹角的余弦来消除偏差。此后,[39]还利用该角度的余弦来进一步减小偏差。然而,这些方法忽略了考虑 SDF 曲率的影响,而这对于无偏体绘制很有用。为了解决上述问题,我们基于对 SDF 曲率和权重函数极值点的分析,设计了从 SDF 到密度的偏差感知变换。具体来说,我们考虑一个简单的场景:射线r与半径为a的圆相交于点L。如图5所示,点r(t)的SDF为s,但射线到达圆表面的距离为y。我们的目标是设计一个 SDF 映射函数,将 σ(s(t))(方程 1)中的 s(t) 替换为 y(t)。我们提出了一个函数,将SDF值s映射到密度变换中的距离y,其目的是减少SDF曲率和射线方向的负面影响。 a为物体A表面上D点的曲率半径。

where the θ(t) is the angle between the view direction and the normal of SDF. Through Pythagorean theorem, the distance l is denoted as:

其中 θ(t) 是视角方向与 SDF 法线之间的角度。通过毕达哥拉斯定理,距离l表示为:

Like the SDF, we define the distance function y(t) can be a negative number, so the above mathematical equations still hold when the r(t) is inside the circle. This indicates that we assume the small local area on the surface can be approximated as a circular arc. For the planar surface, it can be considered the corresponding absolute value of the curvature radius is a very large number. The distance function y(t) can be simplified to y(t) = s(t)/| cos θ(t)|, which is the same as TUVR [39]. We apply the distance y(t) as a kind of calibration to SDF s(t) for volume rendering, so the density field of each point is computed as σ(y(t)). Further, Eq. 19 cannot ensure that the number within the square root is positive because the ray may not intersect with the surface of the circle, which corresponds to the situation |a| < |a + s|| sin θ|. For these points, a naive solution is setting the y(t) to infinity, which occurs the discontinuity of the density field. To keep the density field contiguous, we design the distance y(t) in this case as:

与SDF一样,我们定义距离函数y(t)可以是负数,因此当r(t)在圆内时,上述数学方程仍然成立。这表明我们假设表面上的小局部区域可以近似为圆弧。对于平面来说,可以认为对应的曲率半径绝对值是一个非常大的数。距离函数 y(t) 可以简化为 y(t) = s(t)/| cos θ(t)|,与 TUVR [39]相同。我们将距离 y(t) 作为对 SDF s(t) 的一种校准来进行体积渲染,因此每个点的密度场计算为 σ(y(t))。此外,等式。 19 不能保证平方根内的数为正,因为射线可能不与圆的表面相交,对应的情况是|a| < |a + s||sinθ|。对于这些点,一个简单的解决方案是将 y(t) 设置为无穷大,这会导致密度场不连续。为了保持密度场连续,我们将这种情况下的距离 y(t) 设计为:

which indicates the tangency of another circle whose radius is larger than a. The illumination is shown in the third subfigure in Fig. 5.
Further, we apply the normal curvature radius of SDF at r(t) to estimate the aforementioned curvature radius a. The curvature radius a(t) corresponding to the point r(t) is computed as :

它表示半径大于 a 的另一个圆的切线。照明如图5中的第三个子图所示。

此外,我们应用SDF在r(t)处的法向曲率半径来估计上述曲率半径a。点 r(t) 对应的曲率半径 a(t) 计算如下:

where the R(r(t), v) [59], [60] is the normal curvature radius of SDF at point r(t) toward the ray direction v.

其中 R(r(t), v) [59], [60] 是点 r(t) 处朝向射线方向 v 的 SDF 法向曲率半径。

图 5:我们从 SDF 到密度的偏差感知转换的演示。曲率半径a>0表示与向内弯曲的表面相交,而曲率半径a<0表示与向外弯曲的表面相交。如果 |a| < |a + s||正弦θ|满足,这表明射线与附近的表面没有交点。第四个子图说明了通过差分法估计曲率半径。我们应用射线上的 2 个相邻点来估计前一个点的曲率半径。

3.5.3 Curvature Radius Estimation

Estimate the analytical solution of curvature radius needs to compute the Hessian matrix, which is computationally unfriendly and redundant. We apply an approach to estimate the numerical solution of the curvature radius. As shown in the fourth sub-figure in Fig. 5, suppose there are 2 points A and B with distance d on the ray r. The normals of point A and B are nA and nB respectively, and the n'B is the projection of nB on the plane determined by normal nA and ray direction v. The α is the angle between nA and n'B. The curvature radius R of point A can be estimated based on the Law of Sines:

估计曲率半径的解析解需要计算Hessian 矩阵,计算不友好且冗余。我们应用一种方法来估计曲率半径的数值解。如图5第四子图所示,假设射线r上有2个距离为d的点A和B。 A点和B点的法线分别为nA和nB,n'B是nB在法线nA和光线方向v确定的平面上的投影。α是nA和n'B之间的角度。 A点的曲率半径R可以根据正弦定理估算:

where the χ(nA, n'B) is the indicator function:

其中 χ(nA, n'B) 是指示函数:

3.5.4 Progressive Warm-up

We design a progressive warm-up strategy to stabilize the training phase since the numerical estimation of the curvature radius is not stable at the beginning of training. Specifically, we replace the cos θ(t) to cosp(t) θ(t), where the p(t) is a parameter progressive growing with training iterations from 0 to 1. Then correspondingly, the sin θ is adjusted according to sin2 θ + cos2 θ = 1. Moreover, since the sampling probability of the texture-less region is low, the growing p(t) is designed as the product of a progressively increasing number of p′ and the point-wise uncertainty score, such that the p of texture-less and planar regions increase slower than the detailed and important regions. The p(t) corresponds to the point r(t) is denoted as:

我们设计了渐进式预热策略来稳定训练阶段,因为曲率半径的数值估计在训练开始时不稳定。具体来说,我们将 cos θ(t) 替换为 cosp(t) θ(t),其中 p(t) 是一个参数,随着训练迭代从 0 到 1 逐渐增长。然后相应地,根据 sin2 调整 sin θ θ + cos2 θ = 1。此外,由于无纹理区域的采样概率较低,因此不断增长的 p(t) 被设计为逐渐增加的 p′ 数量与逐点不确定性得分的乘积,例如无纹理和平面区域的 p 增长速度比详细和重要区域慢。 p(t)对应的点r(t)表示为:


where the un(t) is the mean of the normal uncertainty score vector un(t) corresponds to each point instead of the ˆ Un. Optimization The total loss function to optimize the neural implicit geometry and color appearance network is:

其中un(t)是对应于每个点的正态不确定性得分向量un(t)的平均值,而不是^ Un。优化 优化神经隐式几何和颜色外观网络的总损失函数为:

where the λi is the hyper-parameters for weighting each term of the loss function. We set λ1 = 0.05, λ2 = 0.005, λ3 = 0.006, and λ4 = 0.0025.

其中 λi 是对损失函数每一项进行加权的超参数。我们设置 λ1 = 0.05、λ2 = 0.005、λ3 = 0.006 和 λ4 = 0.0025。

4 EXPERIMENTS

Datasets.
We evaluate our method on the following 4 datasets: ScanNet [11], ICL-NUIM [12], Replica [13], and Tanks and Temples [14]. Among them, ScanNet, Replica, and Tanks and Temples are real-world indoor scene datasets, and the Tanks and Temples dataset has the most large-scale indoor scenes. We select 4 scenes from ScanNet for performance evaluation by following the setting of Manhattan-SDF [9]. The ICL-NUIM is a synthetic indoor scene dataset with both surface and trajectory ground truth

数据集。我们在以下 4 个数据集上评估我们的方法:ScanNet [11]、ICL-NUIM [12]、Replica [13] 以及 Tanks and Temples [14]。其中,ScanNet、Replica、Tanks and Temples 是真实的室内场景数据集,其中 Tanks and Temples 数据集拥有最大规模的室内场景。我们按照Manhattan-SDF的设置从ScanNet中选择4个场景进行性能评估[9]。 ICL-NUIM 是一个合成的室内场景数据集,具有表面和轨迹地面实况
Baselines.
We compare our method with the following methods: (i) Neural volume rendering methods with prior, including MonoSDF [1], NeuRIS [10], Manhattan-SDF [9]; (ii) Neural volume rendering methods without prior, including VolSDF [2], NeuS [4], and Unisurf [5]; and (iii) Classical MVS reconstruction method: COLMAP [15].
Metrics.
By following the evaluation protocol of [1], [9], [10], we mainly use the following evaluation metrics: Chamfer Distance (L2), F-score with 5cm threshold, and Normal Consistency.

基线。我们将我们的方法与以下方法进行比较:(i)与先前的神经体绘制方法,包括MonoSDF [1],NeuRIS [10],Manhattan-SDF [9]; (ii) 无先验的神经体绘制方法,包括 VolSDF [2]、NeuS [4] 和 Unisurf [5]; (iii) 经典 MVS 重建方法:COLMAP [15]。

指标。按照[1]、[9]、[10]的评估协议,我们主要使用以下评估指标:倒角距离(L2)、阈值5cm的F分数和法线一致性

4.1 Implementation Details

By following the MonoSDF [1], we apply an MLP with 8 hidden layers as the geometry network to predict SDF and an MLP with 2 layers as the color network to predict the color field. Each layer has 256 hidden nodes. The Adam optimizer with a beginning learning rate of 5 × 10−4 is utilized to optimize the network. The learning rate exponentially decays in each iteration. The size of the input image is 384 × 384, and the pre-trained Omnidata model [49] is applied to generate the geometry prior. We use PyTorch [61] to implement our model and train our model on one NVIDIA GeForce RTX 2080Ti GPU. We train our model for 200,000 iterations with 1024 rays sampled in each iteration. We set the λ = 0.9 to balance the uncertainty map localized from the depth and normal prior. We do not apply the estimated uncertainty regions to guide ray sampling and smoothing for the first 40,000 iterations since the uncertainty localization is not stable at the initial stage. To filter out the wrong prior, we set τd = 0.25, τn = 0.4, and τs = 0.3.

遵循 MonoSDF [1],我们应用具有 8 个隐藏层的 MLP 作为几何网络来预测 SDF 和具有 2 层的 MLP 作为颜色网络来预测色场。每层有 256 个隐藏节点。使用起始学习率为 5 × 10−4 的 Adam 优化器来优化网络。学习率在每次迭代中呈指数衰减。输入图像的大小为384×384,应用预先训练的Omnidata模型[49]来生成几何先验。我们使用 PyTorch [61] 来实现我们的模型并在一个 NVIDIA GeForce RTX 2080Ti GPU 上训练我们的模型。我们训练模型进行 200,000 次迭代,每次迭代采样 1024 条光线。我们设置 λ = 0.9 来平衡深度和法线先验局部化的不确定性图。我们不应用估计的不确定性区域来指导前 40,000 次迭代的射线采样和平滑,因为不确定性定位在初始阶段不稳定。为了过滤掉错误的先验,我们设置 τd = 0.25、τn = 0.4 和 τs = 0.3。

4.2 Performance Comparison with Other Baselines

...

4.3 Performance Evaluation of Different Regions

The aforementioned blend uncertainty map can be utilized to localize the thin and detailed structures in the image, so we divide each image into the masked and unmasked regions by setting different threshold τs's for the blend uncertainty mask obtained from Eq. 17, and the masked regions and unmasked regions correspond to the detailed surface and simple planar surface in the indoor scene, respectively. The visualizations of both parts are shown in Fig. 9. Then, we use the depth abs rel, normal cos similarity, and normal L1 similarity metrics to evaluate the improvements of different regions for each viewpoint. Instead of generating the depth and normal map by volume rendering, we generate the predicted depth and normal map from reconstructed meshes obtained from Matching Cubes [63].

前面提到的混合不确定性图可用于定位图像中的薄而详细的结构,因此我们通过为从式(1)获得的混合不确定性掩模设置不同的阈值τs,将每个图像划分为掩模和未掩模区域。如图17所示,遮蔽区域和未遮蔽区域分别对应于室内场景中的详细表面和简单平面。两个部分的可视化如图9所示。然后,我们使用深度abs rel、正态cos相似性和正态L1相似性度量来评估每个视点不同区域的改进。我们不是通过体积渲染生成深度和法线贴图,而是根据从匹配立方体[63]获得的重建网格生成预测深度和法线贴图。

图 9:混合不确定性图和掩模的可视化。该掩模可以定位室内场景中的精细区域,这有利于光线采样和平滑正则化。

As shown in Table 5, the reconstruction quality improvements of our method on the complex detailed surfaces are more significant than that on the simple planar surfaces. For the bias-aware SDF to density transformation, the improvements of the masked regions are also more significant than the unmasked regions. This further validates the effectiveness of our method for the reconstruction of small objects and detailed regions.

如表5所示,我们的方法在复杂细节表面上的重建质量改进比在简单平面上的重建质量更显着。对于偏置感知 SDF 到密度的转换,屏蔽区域的改进也比未屏蔽区域更显着。这进一步验证了我们的方法对于小物体和细节区域重建的有效性。

4.4 Ablation Studies


图 8:我们方法的消融研究。 "UF"表示具有掩蔽深度和法向损失的不确定性引导先验滤波; "US"表示不确定性引导射线采样; "S"表示不确定性引导平滑。与基线相比,薄而详细的结构的重建受益于我们提出的模块。

5 CONCLUSION

In this work, we have introduced DebSDF which improved the detail and quality of indoor 3D reconstructions by localizing uncertainty regions and introducing a bias-aware SDF-to-density transformation for volume rendering of SDF. Based on the observation that a prior is correct if it is consistent with other priors, we propose an uncertainty modeling approach that effectively identifies large error regions in monocular geometric priors, which usually correspond to fine-detailed regions in the indoor scene. Accordingly, we selectively filter out geometry priors in these regions to avoid their potentially negative effect. We also assign a higher sampling probability to these regions and apply adaptive smooth regularization, further improving reconstruction quality. Furthermore, we found that the volume rendering technique of neural implicit surface used in previous work has a strong bias in eliminating fine-detailed surfaces. Consequently, we propose a progressively growing, bias-aware SDF-to-density transformation method to reduce the impact of these biases, enhancing the reconstruction of thin, detailed structures in indoor environments. Our DebSDF demonstrates improved reconstruction compared to previous work, evidenced by both qualitative visualizations and quantitative evaluations across four challenging datasets. Limitation While DebSDF improves the reconstruction quality significantly over previous work, several limitations exist. First, our method depends on the quality of monocular priors predicted by a pre-trained model and could potentially benefit from future developments of monocular priors. Second, we use images and monocular priors with resolution 384 × 384 because Omnidata [49] model is trained on the images with this resolution, and their performances when using images with high resolution are poor. An alternative way is to resize the monocular priors to a high resolution, which we empirically found ineffective as it generates wrong priors to many pixels. More advanced methods of using high-resolution priors [1] are left as our future work. Additionally, our model's increased time consumption presents another limitation, as we compute normals in the VolSDF [2] point sampling strategy. We believe modifications to this strategy could help reduce computational complexity and we aim to investigate this further.

在这项工作中,我们引入了 DebSDF,它通过定位不确定区域并引入用于 SDF 体积渲染的偏差感知 SDF 到密度变换来提高室内 3D 重建的细节和质量 。基于这样的观察:如果先验与其他先验一致,则先验是正确的,我们提出了一种不确定性建模方法,可以有效地识别单目几何先验中的大误差区域,这些区域通常对应于室内场景中的精细区域。因此,我们有选择地过滤掉这些区域中的几何先验,以避免它们潜在的负面影响。我们还为这些区域分配了更高的采样概率,并应用自适应平滑正则化,进一步提高了重建质量。此外,我们发现之前的工作中使用的神经隐式表面体绘制技术在消除精细表面方面存在很强的偏差。因此,我们提出了一种逐渐发展的、偏差感知的 SDF 到密度转换方法,以减少这些偏差的影响,增强室内环境中薄而详细的结构的重建。与之前的工作相比,我们的 DebSDF 展示了改进的重建,四个具有挑战性的数据集的定性可视化和定量评估证明了这一点。

限制

虽然 DebSDF 比以前的工作显着提高了重建质量,但仍存在一些限制。首先,我们的方法取决于预训练模型预测的单目先验的质量,并且可能受益于单目先验的未来发展。其次,我们使用分辨率为 384 × 384 的图像和单目先验,因为 Omnidata [49] 模型是在该分辨率的图像上训练的,在使用高分辨率图像时它们的性能很差。另一种方法是将单目先验的大小调整为高分辨率,我们根据经验发现这种方法无效,因为它会为许多像素生成错误的先验。使用高分辨率先验 [1] 的更先进方法留待我们未来的工作。此外,当我们在 VolSDF [2] 点采样策略中计算法线时,我们模型增加的时间消耗带来了另一个限制。我们相信对此策略的修改有助于降低计算复杂性,我们的目标是进一步研究这一点。

相关推荐
dundunmm21 分钟前
【论文阅读】SCGC : Self-supervised contrastive graph clustering
论文阅读·人工智能·算法·数据挖掘·聚类·深度聚类·图聚类
愤怒的可乐19 小时前
[论文笔记]Representation Learning with Contrastive Predictive Coding
论文阅读
好评笔记19 小时前
多模态论文笔记——CogVLM和CogVLM2
论文阅读·人工智能·深度学习·机器学习·aigc·transformer·cogvlm
爱补鱼的猫猫2 天前
7、Lora微调论文笔记(低秩适应)
论文阅读
StarCap2 天前
【论文阅读】Reducing Activation Recomputation in Large Transformer Models
论文阅读·深度学习·transformer
Eastmount2 天前
[论文阅读] (34)ESWA2024 基于SGDC的轻量级入侵检测系统
论文阅读·网络安全·sci·iot·入侵检测
ZZZXXE2 天前
BLIP论文笔记
论文阅读
WenBoo-2 天前
论文阅读《Cross-scale multi-instance learning for pathological image diagnosis》
论文阅读
好评笔记2 天前
多模态论文笔记——Coca(副)
论文阅读·人工智能·深度学习·计算机视觉·transformer·coca·dalle2
何大春2 天前
Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight 论文阅读
论文阅读·人工智能·深度学习·论文笔记