【3DV 进阶-12】Trellis.2 数据处理脚本细节

Trellis2 data processing 代码详细介绍（/path/TRELLIS.2/data_toolkit/）

1-提取原始数据

1.1 dump_mesh.py 提取的是纯几何信息（三角化白模），用于后续的形状处理和体素化，保存为 .pickle 文件，包含 vertices 和 faces 数组

1.2 dump_pbr.py 提取的是 完整 PBR 材质信息（带贴图的彩色模型），用于后续的纹理体素化。（耗时比较久）

提取材质信息（每个材质）：
- baseColorFactor + baseColorTexture：基础颜色
- metallicFactor + metallicTexture：金属度
- roughnessFactor + roughnessTexture：粗糙度
- alphaFactor + alphaTexture：透明度
- alphaMode：透明模式（OPAQUE/BLEND/MASK）
保存为 .pickle 文件，包含 materials 和 objects 字典

2- ⭐️体素化（转 O-voxel）

	`dual_grid.py`	`voxelize_pbr.py`
输入	mesh_dumps (纯几何)	pbr_dumps (几何+材质+贴图)
输出	dual_vertices + intersected	baseColor + metallic + roughness + alpha
用途	几何/形状重建	纹理/外观重建
训练目标	Sparse Structure Flow	Tex Latent Flow

2.1 dual_grid.py

2.1 dual_grid.py 将 mesh 转换为 Flexible Dual Grid 表示，这是 TRELLIS 用于几何重建的核心数据结构。
- Step 1 加载 Mesh Dump（第 36-49 行）
- Step 2 归一化到单位立方体 （第 50-56 行）：计算包围盒，居中到原点, 缩放到 [-0.5, 0.5] 范围
- Step 3 生成 Flexible Dual Grid（第 58-71 行）：
  - 调用 o_voxel.convert.mesh_to_flexible_dual_grid()
  - 输出：
    - voxel_indices：占据体素的坐标索引
    - dual_vertices：每个体素内的对偶顶点位置（0-255 量化）
    - intersected：体素与 mesh 表面的相交信息（编码为 0-7）
      
      传统体素: Flexible Dual Grid:
      ┌───┬───┬───┐ ┌───┬───┬───┐
      │ 1 │ 0 │ 0 │ │ • │ │ │ ← 每个占据体素内有一个
      ├───┼───┼───┤ ├───┼───┼───┤ 可移动的"对偶顶点"
      │ 1 │ 1 │ 0 │ ──────► │ • │ • │ │
      ├───┼───┼───┤ ├───┼───┼───┤ • 的位置可以在体素内
      │ 1 │ 1 │ 1 │ │ • │ • │ • │ 连续移动（0~1）
      └───┴───┴───┘ └───┴───┴───┘
      
      只能表示粗糙的可以通过调整 • 的位置
      块状表面逼近平滑的表面

2.2 voxelize_pbr.py

2.2 voxelize_pbr.py 将 PBR 材质数据转换为 体素化的纹理属性，这是 TRELLIS 用于纹理生成的核心数据结构。
1. 加载 PBR Dump （第 37-59 行）：
  - 从 .pickle 文件读取材质和几何数据
2. 归一化顶点 （第 60-70 行）：
  - 计算包围盒，居中到原点
  - 缩放到 [-0.5, 0.5] 范围
  - 将无效的 mat_ids (-1) 映射到默认材质
3. 体素化 PBR 属性 （第 72-76 行）：
  - 调用 o_voxel.convert.blender_dump_to_volumetric_attr()
  - 输出：
    - coord：占据体素的坐标索引
    - attr：每个体素的 PBR 属性（颜色、金属度、粗糙度、透明度）
  - 删除不需要的属性（normal、emissive）
4. 保存格式 （第 76 行）：
  - 保存为 .vxz 文件（压缩体素格式）
  PBR Dump (带贴图的 Mesh) PBR Voxels (体素化纹理)
  ┌──────────────────────┐ ┌───┬───┬───┐
  │ vertices (N, 3) │ │🟥│🟦│ │ 每个体素存储:
  │ faces (F, 3) │ ────► ├───┼───┼───┤ - baseColor (RGB)
  │ uvs (F, 3, 2) │ │🟨│🟩│ │ - metallic
  │ mat_ids (F,) │ ├───┼───┼───┤ - roughness
  │ materials [...] │ │🟪│🟫│🟧│ - alpha
  │ textures [...] │ └───┴───┴───┘
  └──────────────────────┘

3-编码 Latent

3.1 encode_shape_latent.py

3.1 encode_shape_latent.py 将 Dual Grid 数据通过 Shape Encoder 编码为 形状潜在向量（Shape Latent） ，这是 TRELLIS 训练 Shape SLAT Flow 模型所需的监督信号。

加载 Dual Grid 数据 （第 30-46 行）：
- 从 .vxz 文件读取 coords 和 attr
- 构建 vertices SparseTensor：对偶顶点位置 (0-1 归一化)
- 构建 intersected SparseTensor：边界相交信息 (3位解码为布尔)
Encoder 前向推理 （第 165 行）：
- 调用预训练的 shape_enc_next_dc_f16c32_fp16 编码器
- 输入：vertices + intersected（SparseTensor）
- 输出：z（Shape Latent，也是 SparseTensor）
保存格式 （第 173-177 行）：
- 保存为 .npz 文件
- 包含 feats（float32）和 coords（uint8）

编码过程示意

复制代码

Dual Grid (.vxz)                    Shape Latent (.npz)
┌─────────────────────┐             ┌─────────────────────┐
│ coords: (N, 3)      │             │ coords: (M, 3)      │  M << N
│ vertices: (N, 3)    │  ────►      │ feats: (M, C)       │  C = 32 (latent dim)
│ intersected: (N, 3) │  Encoder    │                     │
└─────────────────────┘             └─────────────────────┘
      高分辨率体素                        低分辨率潜在表示
      (256³ grid)                       (16³ 或 32³ grid)

关键点 ：Encoder 将稀疏的高分辨率几何表示压缩为更稀疏的低分辨率潜在向量。

为什么需要这一步？

直接用 Dual Grid 训练	用 Shape Latent 训练
数据维度高（256³）	数据维度低（16³ ~ 32³）
训练慢、显存大	训练快、显存小
Flow 学习原始空间	Flow 学习潜在空间

核心思想：先用 Encoder 压缩几何信息，再让 Flow 模型学习如何生成这个压缩后的表示。推理时 Flow 生成 latent，再用 Decoder 解码回 Dual Grid。

3.2 encode_pbr_latent.py

3.2 encode_pbr_latent.py 将 PBR 体素数据通过 Texture Encoder 编码为 纹理潜在向量（PBR/Texture Latent） ，这是 TRELLIS 训练 Texture SLAT Flow 模型所需的监督信号（GT）。

加载 PBR Voxels 数据（第 30-38 行）：
python 复制代码
```
attrs = ["base_color", "metallic", "roughness", "alpha"]
feats = concat([attr[k] for k in attrs]) / 255.0 * 2 - 1  # 归一化到 [-1, 1]
```
- 从 .vxz 文件读取 4 个 PBR 属性
- 拼接成 6 通道特征（base_color 3 + metallic 1 + roughness 1 + alpha 1）
- 归一化到 [-1, 1] 范围
Encoder 前向推理（第 170 行）：
- 调用预训练的 tex_enc_next_dc_f16c32_fp16 编码器
- 输入：PBR voxels（SparseTensor，6 通道）
- 输出：z（Texture Latent，也是 SparseTensor）
保存格式（第 178-182 行）：
- 保存为 .npz 文件
- 包含 feats（float32）和 coords（uint8）

编码过程示意

复制代码

PBR Voxels (.vxz)                    Texture Latent (.npz)
┌─────────────────────┐              ┌─────────────────────┐
│ coords: (N, 3)      │              │ coords: (M, 3)      │  M << N
│ base_color: (N, 3)  │              │ feats: (M, 32)      │  32 = latent_channels
│ metallic: (N, 1)    │  ────►       │                     │
│ roughness: (N, 1)   │  Encoder     │                     │
│ alpha: (N, 1)       │  (÷16)       │                     │
└─────────────────────┘              └─────────────────────┘
      高分辨率体素                        低分辨率潜在表示
      (256³ / 1024³)                    (16³ / 64³)

3.3 encode_ss_latent.py

3.3 encode_ss_latent.py 将 Shape Latent 的稀疏坐标编码为 SS Latent（Sparse Structure Latent） ，作为训练 Sparse Structure Flow 的 GT（监督信号）。

复制代码

推理时的第一阶段：
  图像 ──► Sparse Structure Flow ──► SS Latent ──► SS Decoder ──► 64³ 占据网格
                                       ▲
                                       │
                         这个脚本生成的就是训练时的 GT

SS Latent 的作用：表示物体"大致在哪里"，是最粗糙的结构信息。

处理流程

复制代码

┌─────────────────────────────────────────────────────────────────────────┐
│                        encode_ss_latent_ours.py                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  输入: shape_latents_256/*.npz                                           │
│        └── coords: (M, 3)  ← 稀疏坐标，范围 0~15                          │
│                                                                          │
│  步骤 1: 转换为稠密占据网格                                                │
│          coords → dense tensor (1, 16, 16, 16)                           │
│          被占据的位置 = 1，其他 = 0                                        │
│                                                                          │
│  步骤 2: SS Encoder 编码                                                  │
│          (1, 16, 16, 16) → ss_enc → (8, 4, 4, 4)                         │
│          下采样 4 倍，8 通道                                              │
│                                                                          │
│  输出: ss_latents_16/*.npz                                               │
│        └── z: (8, 4, 4, 4)  ← 稠密的 SS Latent                           │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

分辨率对应关系

原始数据	Shape Latent coords	SS Encoder 输入	SS Latent 输出
256³	16³	16³	4³
512³	32³	32³	8³
1024³	64³	64³	16³

输入输出

类型	路径	内容
输入	`shape_latents_{data_res}/{name}.npz`	`coords`: 稀疏坐标
输出	`ss_latents_{ss_res}/{name}.npz`	`z`: 稠密 SS Latent

使用方式

bash 复制代码

# 处理 256 数据（默认）
python encode_ss_latent_ours.py

# 处理 1024 数据
python encode_ss_latent_ours.py --data_resolution 1024

【3DV 进阶-12】Trellis.2 数据处理脚本细节

Trellis2 data processing 代码详细介绍 （/path/TRELLIS.2/data_toolkit/）

1-提取原始数据

2- ⭐️体素化（转 O-voxel）

2.1 dual_grid.py

2.2 voxelize_pbr.py

3-编码 Latent

3.1 encode_shape_latent.py

为什么需要这一步？

3.2 encode_pbr_latent.py

3.3 encode_ss_latent.py

使用方式

Trellis2 data processing 代码详细介绍（/path/TRELLIS.2/data_toolkit/）