【论文阅读】为大规模航空图像应用神经辐射场

ABSTRACT
[I. INTRODUCTION](#I. INTRODUCTION)
[V. EXPERIMENTS](#V. EXPERIMENTS)
- [A. Evaluations on the entire datasets](#A. Evaluations on the entire datasets)
- - [A.1. State-of-the-Art comparison](#A.1. State-of-the-Art comparison)
  - [A.2. Cloud-to-Cloud comparison](#A.2. Cloud-to-Cloud comparison)
  - [A.3. Accuracy and completeness](#A.3. Accuracy and completeness)
- [B. Evaluations on the selected regions](#B. Evaluations on the selected regions)
- - [B.1. Fine structures](#B.1. Fine structures)
  - [B.2. Shadow areas](#B.2. Shadow areas)
  - [B.3. Texture-less areas](#B.3. Texture-less areas)
- [C. Analysis and summary](#C. Analysis and summary)
[VI CONCLUSIONS](#VI CONCLUSIONS)

Enabling Neural Radiance Fields (NeRF) for Largescale Aerial Images -- A Multi-tiling Approach and the Geometry Assessment of NeRF

主要关注其评估部分，看使用了哪些评估方法和对哪些目标进行了评估

ABSTRACT

Neural Radiance Fields (NeRF) offer the potential to benefit 3D reconstruction tasks, including aerial photogrammetry. However, the scalability and accuracy of the inferred geometry are not well-documented for large-scale aerial assets,since such datasets usually result in very high memory consumption and slow convergence... In this paper, we aim to scale the NeRF on large-scael aerial datasets and provide a thorough geometry assessment of NeRF. Specifically, we introduce a location-specific sampling technique as well as a multi-camera tiling (MCT) strategy to reduce memory consumption during image loading for RAM, representation training for GPU memory, and increase the convergence rate within tiles. MCT decomposes a large-frame image into multiple tiled images with different camera models, allowing these smallframe images to be fed into the training process as needed for specific locations without a loss of accuracy. We implement our method on a representative approach, Mip-NeRF, and compare its geometry performance with threephotgrammetric MVS pipelines on two typical aerial datasets against LiDAR reference data. Both qualitative and quantitative results suggest that the proposed NeRF approach produces better completeness and object details than traditional approaches, although as of now, it still falls short in terms of accuracy. The codes are available at https://github.com/GDAOSU/MCT_NERF.

神经辐射场 (NeRF) 提供了有益于 3D 重建任务的潜力，包括航空摄影测量。然而，大规模航空数据的推断几何形状的可扩展性和准确性并没有得到很好的记录，因为此类数据集通常会导致非常高的内存消耗和缓慢的收敛速度。在本文中，我们的目标是在大规模航空数据上扩展 NeRF，并提供 NeRF 的全面几何评估。具体来说，我们引入了特定位置采样技术以及多相机平铺（MCT）策略，以减少 RAM 图像加载、隐式表达训练期间的GPU 内存消耗，并提高瓦片内的收敛速度。 MCT 将大帧图像分解为具有不同相机型号的多个平铺图像，允许将这些小帧图像根据特定位置的需要输入到训练过程中，而不会损失准确性。我们在代表性方法 Mip-NeRF 上实现了我们的方法，并将其几何性能与两个典型航空数据集上的三摄影测量 MVS 管道与 LiDAR 参考数据进行了比较。定性和定量结果都表明，所提出的 NeRF 方法比传统方法产生更好的完整性和对象细节，尽管到目前为止，它在准确性方面仍然存在不足。代码可在 https://github.com/GDAOSU/MCT_NERF 获取。

I. INTRODUCTION

NeRF is known to be particularly successful in view rendering of traditionally difficult and non-cooperative objects, such as texture-less, transparent, and reflecting surfaces $57$ . However, it is still unclear how the NeRF-derived 3D geometry performs on large-sized aerial images, partly due to the scale of the problem. An earlier work $44$ evaluated the performances of various NeRF methods on close-range heritage assets, showing that NeRF-derived 3D geometry can be robust to reflective and transparent surfaces. Therefore, we expect that a thorough evaluation of aerial scenarios can be particularly useful to assess the NeRF-derived 3D geometry for mapping.

众所周知，NeRF 在传统上困难和非合作对象的视图渲染方面特别成功，例如无纹理、透明和反射表面 $57$ 。然而，目前尚不清楚 NeRF 衍生的 3D 几何在大尺寸航空图像上的表现如何，部分原因是问题的规模（很大）。早期的一项工作 $44$ 评估了各种 NeRF 方法在近距离 heritage 数据上的性能，表明 NeRF 派生的 3D 几何对于反射和透明表面具有鲁棒性。因此，我们预计，对空中场景的全面评估对于评估 NeRF 导出的 3D 几何进行测绘将特别有用。

V. EXPERIMENTS

We evaluated the accuracy of the generated geometry at the point cloud level. For both datasets, the aerial triangulation of the images was done using Agisoft Metashape, and the images were undistorted using the obtained lens distortion parameters. All experiments were run with full image resolution. To evaluate the quality of the resulting point clouds, we employed the cloud-to-cloud distance measurement $51$ , using the available LiDAR data as the reference. Additionally, we evaluated the completeness and accuracy of the point clouds by determining the percentage of points falling within varying threshold values $35-36$ . All experiments were conducted utilizing an Intel® Xeon® W-2235 CPU @ 3.8GHz processor, 64GB of RAM, and an NVIDIA GeForce RTX 3090 with 24GB of VRAM.

我们评估了点云级别生成的几何图形的准确性。对于这两个数据集，图像的空中三角测量是使用 Agisoft Metashape 完成的，并且使用获得的镜头畸变参数对图像进行去畸变。所有实验均以全图像分辨率运行。为了评估生成的点云的质量，我们采用了云到云距离测量 $51$ ，并使用可用的 LiDAR 数据作为参考。此外，我们通过确定落在不同阈值范围内的点的百分比来评估点云的完整性和准确性 $35-36$ 。所有实验均使用 Intel® Xeon® W-2235 CPU @ 3.8GHz 处理器、64GB RAM 和带有 24GB VRAM 的 NVIDIA GeForce RTX 3090 进行。

We first evaluated the memory consumption and convergence performance of our approach compared to two SOTA NeRF methods: Mip-NeRF $17$ and Mega-NeRF $23$ . Then, the best-performed NeRF method was compared in terms of geometric reconstruction performance with three traditional MVS software:

我们首先评估了我们的方法与两种 SOTA NeRF 方法的内存消耗和收敛性能：Mip-NeRF $17$ 和 Mega-NeRF $23$ 。然后，将性能最佳的 NeRF 方法在几何重建性能方面与三种传统 MVS 软件进行了比较：

Agisoft Metashape (https://www.agisoft.com/)is a professional photogrammetry software used for generating 3D models from a set of 2D images. It employs structure-frommotion (SfM) and MVS to analyze the overlapping images and extract 3D information. Since it is a commercial package, the information on used SfM and MVS algorithms is missing.

Agisoft Metashape (https://www.agisoft.com/) 是一款专业摄影测量软件，用于从一组 2D 图像生成 3D 模型。它采用运动结构 (SfM) 和 MVS 来分析重叠图像并提取 3D 信息。由于它是一个商业包，因此缺少有关使用的 SfM 和 MVS 算法的信息。

OpenMVS (https://github.com/cdcseacave/openMVS) is an open-source software package for patch-based MVS. It is based on the concept introduced by $8$ , which involves initially selecting stereo pairs for each image based on factors such as the viewing angle of visible points and the distance between camera centers. Depth maps are then computed for each pair using a patch-based method $3$ , followed by a depth refinement process to ensure consistency across neighboring views or priors to improve completeness in texture-less areas $9$ . Subsequently, a depth merging process is employed, which considers redundancy and occlusion checks among neighboring images to generate the final photogrammetric point cloud. An extension based on plane priors was presented in $9$

OpenMVS (https://github.com/cdcseacave/openMVS) 是一个 patch-based MVS 的开源软件包。它基于 $8$ 引入的概念，其中涉及根据可见点的视角和相机中心之间的距离等因素为每个图像初始选择立体对。然后使用基于补丁的方法 $3$ 计算每对的深度图，然后进行深度细化过程，以确保相邻视图或先验之间的一致性，从而提高无纹理区域的完整性 $9$ 。随后，采用深度合并过程，考虑相邻图像之间的冗余和遮挡检查，以生成最终的摄影测量点云。 $9$ 中提出了基于平面先验的扩展

Multi-view Stereo Processor (MSP) $25$ is a SGM-based MVS algorithm. A set of stereo pairs is first selected for each image based on the criteria including camera poses and number of correspondences. Then, for each image, a Census-based SGM method $4$ $54$ is applied to generate pairwise depth maps for the corresponding stereo pairs, followed by a median filtering method to derive the high-quality per-view depth map. The final photogrammetric point clouds are the merging results of the per-view depth maps.

Multi-view Stereo Processor（MSP） $25$ 是一种基于SGM的MVS算法。首先根据包括相机姿势和对应数量在内的标准为每个图像选择一组立体对。然后，对于每个图像，应用基于Census的 SGM 方法 $4$ $54$ 为相应的立体对生成成对深度图，然后采用中值滤波方法来导出高质量的每个视图深度图。最终的摄影测量点云是每个视图深度图的合并结果。

A. Evaluations on the entire datasets

A.1. State-of-the-Art comparison

A. 对整个数据集的评估

A.1。最先进的比较
The proposed approach was compared to two SOTA NeRF methods (Mip-NeRF $17$ and Mega-NeRF $23$ ) to demonstrate the effectiveness of the proposed NeRF variant for reconstructing large-scale aerial scenarios. We adjusted the essential hyper-parameters related to RAM and VRAM such that they were maximally consistent across all three methods (Table 2). To avoid memory issues caused by Mega-NeRF (crashed in our experiment if original resolutions are used), we down-sampled the image by a factor of two, for this particular comparison (but the full resolution was experimented in full accuracy analysis in Section 5.B).

将所提出的方法与两种 SOTA NeRF 方法（Mip-NeRF $17$ 和 Mega-NeRF $23$ ）进行比较，以证明所提出的 NeRF 变体在重建大规模空中场景方面的有效性。我们调整了与 RAM 和 VRAM 相关的基本超参数，使它们在所有三种方法中保持最大程度的一致（表 2）。为了避免由 Mega-NeRF 引起的内存问题（如果使用原始分辨率，我们的实验中会崩溃），我们将图像下采样两倍，以进行此特定比较（但在第 1 节中的全精度分析中对全分辨率进行了实验5.B).

The main processes that consume the VRAM were the forward propagation of the training batch and the backward passing of the gradients of learnable parameters. Using the same model architecture and training batch size, our method consumed significantly less VRAM thanks to our MCT strategy. In addition, RAM played an important role in caching the training data. During the training process, As shown in Table 3, the proposed method and Mega-NeRF required much less RAM than Mip-NeRF. This was due to that both ours and Mega-NeRF had a data partition module that allowed an out-of-core process. Moreover, in the data partition process, the memory consumption of Mega-NeRF was predominantly associated with its ray casting procedure, which involved the creation of 3D volumes for each full-size image, while our method performed these steps locally, which can be demonstrated in the "Data Partition" result of Table 3. However, a substantial distinction arised when considering the loading of images into RAM. Unlike Mip-NeRF, which necessitates loading a set of full-size images, our method solely loads the sub-image set. Table 4 demonstrates that, despite having the same training time, our method achieved a superior Peak Signal-to-Noise Ratio (PSNR) compared to Mip-NeRF and Mega-NeRF. This improvement in PSNR suggests that our method achieved a faster convergence rate, even though we employed the same network architecture as Mip-NeRF.

消耗 VRAM 的主要过程是训练批次的前向传播和可学习参数梯度的后向传递。使用相同的模型架构和训练批量大小，由于我们的 MCT 策略，我们的方法消耗的 VRAM 显着减少。此外，RAM 在缓存训练数据方面发挥着重要作用。在训练过程中，如表3所示，所提出的方法和Mega-NeRF需要的RAM比Mip-NeRF少得多。这是因为我们和 Mega-NeRF 都有一个允许核外进程的数据分区模块。此外，在数据分区过程中，Mega-NeRF 的内存消耗主要与其光线投射过程相关，该过程涉及为每个全尺寸图像创建 3D 体积，而我们的方法在本地执行这些步骤，这可以证明在表 3 的"数据分区"结果中。然而，当考虑将图像加载到 RAM 中时，出现了实质性的区别。与需要加载一组全尺寸图像的 Mip-NeRF 不同，我们的方法仅加载子图像集。表 4 表明，尽管训练时间相同，但与 Mip-NeRF 和 Mega-NeRF 相比，我们的方法实现了更高的峰值信噪比 (PSNR)。 PSNR 的改进表明我们的方法实现了更快的收敛速度，即使我们采用与 Mip-NeRF 相同的网络架构。

A.2. Cloud-to-Cloud comparison

A2。点云比较

A cloud-to-cloud comparison refers to the measurement of absolute Euclidean distances between 3D samples in a dataset concerning the reference data $51$ , $55$ . For both aerial datasets, dense point clouds were derived using three photogrammetry methods (OpenMVS, MSP, Metashape) as well as the proposed NeRF variant, then co-registered to the available ground truth LiDAR data (Figure 4) and comparison metrics derived (Table 5). It should be noted that OpenMVS failed to generate point clouds for the Bordeaux dataset due to memory overflow in the depth fusion process. Figure 4 illustrates that our NeRF method yielded less accurate results, where the inaccurate points are mainly located near the boundary in the Dortmund dataset due to the lack of images at the collection boundaries. This is a known issue with NeRF when using sparse views, causing inaccurate points near the camera centers $56$ , $57$ . Table 5 confirms that NeRF had the highest mean errors (1.4392m in Dortmund and 0.9232m in Bordeaux). In comparison, MSP had the lowest mean error (0.7610m), followed by OpenMVS (0.8620m) and Metashape (1.1642m) in the Dortmund dataset.

点云比较是指测量数据集中关于参考数据的 3D 样本之间的绝对欧几里德距离 $51$ 、 $55$ 。对于这两个航空数据集，使用三种摄影测量方法（OpenMVS、MSP、Metashape）以及提出的 NeRF 变体导出密集点云，然后共同配准到可用的地面实况 LiDAR 数据（图 4）和导出的比较指标（表5）。需要注意的是，由于深度融合过程中内存溢出，OpenMVS 未能为 Bordeaux 数据集生成点云。图 4 说明我们的 NeRF 方法产生的结果不太准确，由于集合边界处缺乏图像，不准确的点主要位于多特蒙德数据集中的边界附近。这是 NeRF 在使用稀疏视图时的一个已知问题，导致相机中心附近的点不准确 $56$ 、 $57$ 。表 5 证实 NeRF 的平均误差最高（多特蒙德为 1.4392m，波尔多为 0.9232m）。相比之下，多特蒙德数据集中，MSP 的平均误差最低（0.7610m），其次是 OpenMVS（0.8620m）和 Metashape（1.1642m）。

A.3. Accuracy and completeness

A.3.准确性和完整性

The accuracy and completeness of the aforementioned 3D reconstruction outcomes were evaluated using varying distance thresholds. Accuracy refers to the percentage of examined point clouds deemed accurate (falling within a specific distance threshold from the LiDAR data) while completeness is the percentage of LiDAR points that are covered by the examined point cloud (falling within a specified distance of the examined point cloud). The findings, illustrated in Figure 5, validate that NeRF consistently yielded less precise results than any of the photogrammetry methods, regardless of the chosen distance threshold. Examining the completeness curve for the Dortmund dataset, we observed that NeRF achieved comparable completeness to MSP and OpenMVS when the threshold was set at less than 3 meters whereas NeRF outperforms these two photogrammetry methods in terms of completeness when the threshold exceeded 3 meters. Notably, the points contributing to this improved completeness were primarily those labeled as "yellow" and "red" in Figure 4-d.

使用不同的距离阈值评估上述 3D 重建结果的准确性和完整性。准确性是指被检查点云被认为准确的百分比（落在 LiDAR 数据的特定距离阈值内），而完整性是指被检查点云覆盖的 LiDAR 点的百分比（落在检查点云的指定距离内））。图 5 所示的研究结果证实，无论选择的距离阈值如何，NeRF 始终产生不如任何摄影测量方法精确的结果。检查多特蒙德数据集的完整性曲线，我们观察到，当阈值设置在 3 米以内时，NeRF 实现了与 MSP 和 OpenMVS 相当的完整性，而当阈值超过 3 米时，NeRF 在完整性方面优于这两种摄影测量方法。值得注意的是，有助于提高完整性的点主要是图 4-d 中标记为"黄色"和"红色"的点。

B. Evaluations on the selected regions

B.1. Fine structures

B. 对选定区域的评估

B.1．精细结构

Traditional MVS pipelines often encounter difficulties in reconstructing small and/or thin objects due to the imposed smoothness constraints, which penalize depth discontinuities within local surfaces. This penalty ultimately leads to the loss of fine details and information. Figure 6 presents some tiny objects which vary in width from 2 to 11 pixels in the image space and results show that the proposed NeRF method outperformed traditional MVS pipelines in terms of completeness. For instance, the NeRF method successfully reconstructed the thin light pole in the second row, the steel bars in a horizontal direction in the fourth row, and the six stone pillars in the fifth row, none of which can be reconstructed by traditional MVS pipelines. As the width of objects increases to 7-11 pixels of footprint in images, both photogrammetry and NeRF methods can reconstruct them with completion (Figure 6 first row). A quantitative analysis of the object in the fourth row of Figure 6 is shown in Figure 7. NeRF exhibited the highest completeness while maintaining comparable accuracy to traditional MVS pipelines. Within a tolerance of 0.25m, the NeRF successfully reconstructed 86.3% of the LiDAR data, followed by OpenMVS at 80.7%, MSP at 71.6%, and Metashape at 60.4%. Notably, the largest discrepancy is observed in the completeness curve and the lower part of the object.

由于强加的平滑度约束，传统的 MVS 管道在重建小和/或薄物体时经常遇到困难，这会影响局部表面内的深度不连续性。这种惩罚最终会导致精细细节和信息的丢失。图 6 展示了图像空间中宽度从 2 到 11 像素不等的一些微小物体，结果表明所提出的 NeRF 方法在完整性方面优于传统的 MVS 管道。例如，NeRF方法成功重建了第二排的细灯杆、第四排水平方向的钢筋、第五排的六根石柱，这些都是传统MVS管道无法重建的。随着图像中物体的宽度增加到 7-11 像素，摄影测量和 NeRF 方法都可以完整地重建它们（图 6 第一行）。对图 6 第四行中对象的定量分析如图 7 所示。NeRF 表现出最高的完整性，同时保持与传统 MVS 管道相当的精度。在0.25m的公差范围内，NeRF成功重建了86.3%的LiDAR数据，其次是OpenMVS（80.7%）、MSP（71.6%）和Metashape（60.4%）。值得注意的是，最大的差异出现在完整性曲线和物体的下部。

B.2. Shadow areas

Real-world aerial datasets often contain shadow areas, which are of particular interest for our analysis. To represent such regions, we selected two distinctive areas: the first comprised a square ground, half of which was covered in shadow while the other half was under direct sunlight. The second area was a church building with its north side in shadow, as depicted in Figure 8. The 3D results obtained by our NeRF method exhibited notable accuracy inconsistencies between the shadow and sunshine areas, whereas traditional methods yielded more consistent outcomes. By looking at the images in Figure 8-a, the shadow area had a similar texture pattern as the sunshine area while having a different level of illustration. Moreover, even in the non-shaded area, our NeRF method performed worse in such flat surfaces, especially in areas with uniform color patterns. To quantitatively assess the accuracy, mean and standard deviation errors with respect to the LiDAR data were computed (Table 6). Combining the visual and quantitative findings, we observed that NeRF produced less accurate geometry on flat surfaces, and their performance further deteriorated when image intensity decreased, primarily due to that the loss was built based on the photometric loss, which was less informative for pixel at low intensity.

B.2.阴影区域

现实世界的航空数据集通常包含阴影区域，这对我们的分析特别感兴趣。为了代表这些区域，我们选择了两个独特的区域：第一个区域由一个方形地面组成，其中一半被阴影覆盖，另一半则在阳光直射下。第二个区域是一座教堂建筑，其北侧处于阴影中，如图 8 所示。通过我们的 NeRF 方法获得的 3D 结果在阴影和阳光区域之间表现出明显的精度不一致，而传统方法产生的结果更为一致。通过查看图 8-a 中的图像，阴影区域具有与阳光区域相似的纹理图案，但具有不同的插图级别。此外，即使在非阴影区域，我们的 NeRF 方法在此类平坦表面上的表现也较差，尤其是在具有均匀颜色图案的区域中。为了定量评估准确性，计算了 LiDAR 数据的平均值和标准偏差误差（表 6）。结合视觉和定量结果，我们观察到 NeRF 在平面上产生的几何形状不太准确，并且当图像强度降低时，它们的性能进一步恶化，这主要是由于损失是基于光度损失而建立的，对于在低强度。

B.3. Texture-less areas

In traditional MVS pipelines, it is common to encounter low completeness in 3D results, particularly in texture-less regions present on building facades and water surfaces. This issue arises due to the significant matching ambiguity in such areas. We selected a building displaying a uniform color pattern on its roof, captured in more than 30 images. Figure 9-a demonstrates that the NeRF method effectively filled the holes in the textureless surface whereas traditional MVS pipelines, except OpenMVS, failed to do so. While our NeRF method produced visually complete results (less emptiness), the completeness curve in Figure 9-b indicates that it underperforms OpenMVS. This suggests that the NeRF method generally introduced more erroneous points. The non-hole area primarily represented a flat surface under shadow conditions, where the NeRF method exhibited inferior performance, as explained in the previous section.

B.3.无纹理区域

在传统的 MVS 管道中，3D 结果的完整性很差，特别是在建筑物外墙和水面上存在的无纹理区域。出现此问题的原因是这些区域存在严重的匹配模糊性。我们选择了一座屋顶上显示统一颜色图案的建筑，该建筑由 30 多张图像拍摄而成。图 9-a 表明 NeRF 方法有效地填充了无纹理表面中的孔洞，而除 OpenMVS 之外的传统 MVS 管道却无法做到这一点。虽然我们的 NeRF 方法产生了视觉上完整的结果（更少的空洞），但图 9-b 中的完整性曲线表明它的性能低于 OpenMVS。这表明 NeRF 方法通常会引入更多的错误点。无孔区域主要代表阴影条件下的平坦表面，其中 NeRF 方法表现出较差的性能，如上一节所述。

C. Analysis and summary

In summary, the proposed strategy enables NeRF (hereafter we called our NeRF method for simplicity) to achieve better convergence rate and requires less RAM and VRAM than the original version of SOTA methods for aerial cases. Its derived 3D results underperform traditional MVS pipelines in terms of accuracy, particularly in shadow areas. However, it demonstrated better performance at reconstructing small objects (i.e., complex structures with parts taking small number of pixels in the image) and texture-less regions. Specifically:

C. 分析和总结

总之，所提出的策略使 NeRF（为简单起见，下文中我们称为 NeRF 方法）能够实现更好的收敛速度，并且比原始版本的航空案例 SOTA 方法需要更少的 RAM 和 VRAM。其导出的 3D 结果在准确性方面低于传统 MVS 管道，尤其是在阴影区域。然而，它在重建小物体（即复杂结构，其部分占用图像中少量像素）和无纹理区域方面表现出更好的性能。具体来说：
Advantages: • Our NeRF method demonstrates improved training convergence rates due to the location-specific sampling strategy. Additionally, it requires significantly less RAM during data partition process, due to the proposed multi-camera tiling technique. • Our NeRF method excels in reconstructing intricate geometric structures that are challenging for traditional MVS pipelines, such as thin light bulb pillars and steel pillars. Unlike traditional pipelines, NeRF applies a per-pixel photometric loss function that does not penalize depth discontinuity between neighboring pixels. • On processing the aerial blocks, our NeRF method (generally for NeRF approaches) generates denser and visually more complete geometry. Traditional MVS pipelines produce holes due to depth estimation based on limited observations from neighboring images and subsequent outlier removal processes. In contrast, NeRF estimates geometry by optimizing a single cost function that incorporates all multi-view observations.

优点：

• 我们的 NeRF 方法展示了由于特定位置的采样策略而提高的训练收敛率。此外，由于所提出的多摄像头平铺技术，在数据分区过程中需要的 RAM 显着减少。

• 我们的 NeRF 方法擅长重建对传统 MVS 管道具有挑战性的复杂几何结构，例如薄灯泡柱和钢柱。与传统管道不同，NeRF 应用每像素光度损失函数，不会惩罚相邻像素之间的深度不连续性。

• 在处理空中块时，我们的 NeRF 方法（通常用于 NeRF 方法）生成更密集且视觉上更完整的几何形状。传统的 MVS 管道会因基于相邻图像的有限观察和后续异常值去除过程的深度估计而产生漏洞。相比之下，NeRF 通过优化包含所有多视图观测的单个成本函数来估计几何形状。
Disadvantages: • Geometry produced by our NeRF method often exhibits errors in shadow areas. This problem can be attributed to the instability of backpropagation during the optimization process, due to that the loss was constructed based on the pixel intensity. Moreover, the depth generation process of NeRF (Eq-1,2,4) is a function of the color intensity. Therefore, the depth exhibit certain correlation with the brightness of the pixels, and introduce unwanted errors, which are very often visible for in depth maps of flat regions where brightness of the texture varies. • The geometry generated by NeRF is generally than that produced by traditional MVS pipelines. Experimental results indicate that the NeRF generated results possess larger geometric uncertainties, often observed at flat regions. NeRF operates its cost on distributions based on photoconsistency, while traditional MVS pipelines stress consistency of various costs following mulit-view constraints. It is however, understandable, that NeRF is initially designed for view generation not for geometry.

缺点：

• 我们的 NeRF 方法生成的几何图形经常在阴影区域出现错误。这个问题可以归因于优化过程中反向传播的不稳定性，因为损失是基于像素强度构建的。此外，NeRF（Eq-1,2,4）的深度生成过程是颜色强度的函数。因此，深度与像素的亮度表现出一定的相关性，并引入不需要的误差，这对于纹理亮度变化的平坦区域的深度图中经常可见。

• NeRF 生成的几何图形通常优于传统MVS 管道生成的几何图形。实验结果表明，NeRF 生成的结果具有较大的几何不确定性，通常在平坦区域观察到。 NeRF 在基于光一致性的分布上运行其成本，而传统的 MVS 管道则强调在多视图约束下各种成本的一致性。然而，可以理解的是，NeRF 最初是为视图生成而不是几何体而设计的。

VI CONCLUSIONS

This study presents a thorough evaluation of NeRF with comparison to three traditional MVS pipelines using two aerial photogrammetry datasets. Typically, NeRF was developed mostly at dealing with close-range cases with small to medium format images, often focusing on small scenes. Due to its high demand for both RAM and GPU memory, it presents computational challenges to process large-foramt and typical aerial photogrammetric blocks. To enable standard NeRF approach for such large-format images (i.e. 50-80 mega pixels), we presented a memory-efficient strategy to facilitate NeRF on large-scale 3D scene reconstruction. This approach reduces the memory demands by partitioning the training images into sub-image sets and employing efficient sampling techniques within the smaller sub-regions during the training process, and can adapt any NeRF methods to large-format data.

结论这项研究使用两个航空摄影测量数据集对 NeRF 进行了全面评估，并与三个传统 MVS 流程进行了比较。通常，NeRF 主要是为了处理中小格式图像的近距离情况而开发的，通常侧重于小场景。由于对 RAM 和 GPU 内存的高需求，它给处理大尺寸和典型的航空摄影测量块带来了计算挑战。为了对此类大格式图像（即 50-80 兆像素）启用标准 NeRF 方法，我们提出了一种内存高效策略，以促进 NeRF 进行大规模 3D 场景重建。这种方法通过将训练图像划分为子图像集并在训练过程中在较小的子区域内采用有效的采样技术来减少内存需求，并且可以使任何 NeRF 方法适应大格式数据。
With adapting our proposed strategy to a typical NeRF structure such as Mip-NeRF, called our proposed NeRF method, we compared it against traditional MVS pipelines, and performed thorough experimental analysis to evaluate its potential to serve the photogrammetric 3D reconstruction purpose. Generally, we oberve that NeRF can recover the scene with better completes, with specifically outperform traditional MVS methods on reconstructing small objects (objects with small image pixel footprints). However, its performance on flat regions/large scenes are still not on par with typical MVS methods. This fact is due to that the NeRF structure and rendering process is not designed for geometry, where MVS method is solely solely designed for derive accurate geometry based on ray triangulating. More specific, and technical analysis these pros and cons can be found in Section 5.
The findings of this study shed light on on future research and improvement on NeRF to serve for photogrammetric 3D reconstruction purpose, as well means to incorporate its advantages into 3D reconstruction workflow. Firstly, the unique advantage of NeRF in reconstructing small objects, can be particularly useful and should be researched further for incorporating into photogrammetric 3D reconstruction process. Second, the depth rendering equation, and the intensity-based loss in NeRF are sub-optimal for 3D reconstruction, future endeavors can be valuable to improve them to be in favor of 3D reconstruction. Lastly, stressing multi-view consistency in the NeRF structure may further improve the 3D geometric reconstruction.

通过将我们提出的策略适应典型的 NeRF 结构，例如 Mip-NeRF，称为我们提出的 NeRF 方法，我们将其与传统的 MVS 管道进行比较，并进行彻底的实验分析，以评估其服务于摄影测量 3D 重建目的的潜力。一般来说，我们观察到 NeRF 可以更好地恢复场景，尤其是在重建小对象（具有小图像像素足迹的对象）方面优于传统的 MVS 方法。然而，它在平坦区域/大场景上的性能仍然无法与典型的 MVS 方法相提并论。这一事实是因为 NeRF 结构和渲染过程不是为几何设计的，而 MVS 方法只是为了基于光线三角测量导出精确的几何而设计的。更具体的技术分析这些优点和缺点可以在第 5 节中找到。

这项研究的结果揭示了 NeRF 的未来研究和改进，以服务于摄影测量 3D 重建目的，以及将其优势融入 3D 重建的方法工作流程。首先，NeRF 在重建小物体方面的独特优势特别有用，应该进一步研究以纳入摄影测量 3D 重建过程。其次，NeRF 中的深度渲染方程和基于强度的损失对于 3D 重建来说不是最佳的，未来的努力对于改进它们以有利于 3D 重建来说是有价值的。最后，强调 NeRF 结构中的多视图一致性可以进一步改善 3D 几何重建。