硬核,3000字全攻略! 小白都看得懂的数字人搭建流程: RAD-NeRF 应用篇

介绍

通过音频空间分解的实时神经辐射说话肖像合成。(所有文字都认识,但是拼在一起就是看不懂)

开源地址:github.com/ashawkey/RA...

项目页面:me.kiui.moe/radnerf/

实际效果:

搭建环境

使用云镜像(autodl):

1、拉取代码

bash 复制代码
#1. 拉取代码仓库
git clone https://github.com/ashawkey/RAD-NeRF.git
#2. 进入项目目录
cd RAD-NeRF 

2、创建虚拟环境

ini 复制代码
#1. 创建虚拟环境
conda create --name jmaat666 python=3.10
#2. 切换环境
conda activate jmaat666

3、安装cuda依赖

r 复制代码
#pytorch 要单独对应cuda进行安装,要不然训练时使用不了GPU
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia
conda install -c fvcore -c iopath -c conda-forge fvcore iopath

查看CUDA版本

bash 复制代码
1. NVIDIA提供了一个命令行工具nvidia-smi,它可以显示当前系统中安装的NVIDIA驱动程序和CUDA版本信息。
nvidia-smi

2. nvcc是CUDA的编译器驱动程序,它也可以用来查看CUDA版本​
nvcc --version

3. 查看CUDA安装目录:
ls /usr/local/cuda 或 ls /opt/cuda​

# 测试cuda依赖是否成功安装,testCUDA.py
import torch

print(torch.cuda.is_available())  #return true
print(torch.version.cuda)  #cuda version

按照CUDA对应的版本安装:pytorch.org/get-started...

4、安装依赖

bash 复制代码
#安装所需要的依赖,在RAD-NeRF目录下
pip install -r requirements.txt

5、安装pytorch3d扩展

bash 复制代码
# 1. 安装 pytorch3d
直接安装
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install "git+https://github.com/facebookresearch/pytorch3d.git@v0.7.3"

或者(推荐使用)

a.先拉取仓库地址(可以退到上一级目录操作)
git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d
b.查看对应的pytorch版本:https://github.com/facebookresearch/pytorch3d?tab=readme-ov-file#news
c.切换版本,并安装
git checkout v0.7.3
pip install . 
bash 复制代码
# 2. 从AD-NeRF仓库,获取模型(可以退到上一级目录操作)
推荐方法:
a. 拉取仓库代码
git clone https://github.com/YudongGuo/AD-NeRF.git
b. 拷贝模型和测试视频
cp AD-NeRF/data_util/face_parsing/79999_iter.pth RAD-NeRF/data_utils/face_parsing/
cp -R AD-NeRF/data_util/face_tracking/3DMM RAD-NeRF/data_utils/face_tracking/
mkdir -p RAD-NeRF/data/obama/
cp AD-NeRF/dataset/vids/Obama.mp4 RAD-NeRF/data/obama/

或者

## 准备人脸解析模型
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth
## 准备basel脸部模型
从 https://faces.dmi.unibas.ch/bfm/main.php?nav=1-2&id=downloads 下载01_MorphableModel.mat放到Rad-NeRF/data_utils/face_trackong/3DMM里面。
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/exp_info.npy?raw=true -O data_utils/face_tracking/3DMM/exp_info.npy
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/keys_info.npy?raw=true -O data_utils/face_tracking/3DMM/keys_info.npy
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/sub_mesh.obj?raw=true -O data_utils/face_tracking/3DMM/sub_mesh.obj
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/topology_info.npy?raw=true -O data_utils/face_tracking/3DMM/topology_info.npy
mkdir -p data/obama/
wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/obama/obama.mp4
shell 复制代码
# 3. 下载人脸模型,并把人脸模型 01_MorphableModel.mat 放在 data_utils/face_tracking/3DMM/ 目录下

最后该目录下的文件如下:

下载人脸模型地址:faces.dmi.unibas.ch/bfm/main.ph...

下载目录如下:

bash 复制代码
# 4. 执行convert_BFM.py文件
cd RAD-NeRF/data_utils/face_tracking
python convert_BFM.py
# 5. 返回原目录​
cd ../..

6、视频预处理

bash 复制代码
#视频预处理
python data_utils/process.py data/obama/Obama.mp4

#或者 分步处理,一共有九个步骤

python data_utils/process.py data/obama/Obama.mp4 --task 1 分离音频
...

执行过程中,会下载四个模型,如果没有魔法上网,这四个模型下载很慢,或者直接下到一半就崩掉了。

模型地址:

download.pytorch.org/models/resn...

www.adrianbulat.com/downloads/p...

www.adrianbulat.com/downloads/p...

download.pytorch.org/models/alex...

可以事先下载好4个模型,到指定的目录:

Linux:/root/.cache/torch/hub/checkpoints/

Window:C:\用户\用户名(xx).cache\torch\hub\checkpoints

执行过程:

ini 复制代码
./data/<ID>
├──<ID>.mp4 # original video
├──ori_imgs # original images from video
│  ├──0.jpg
│  ├──0.lms # 2D landmarks
│  ├──...
├──gt_imgs # ground truth images (static background)
│  ├──0.jpg
│  ├──...
├──parsing # semantic segmentation
│  ├──0.png
│  ├──...
├──torso_imgs # inpainted torso images
│  ├──0.png
│  ├──...
├──aud.wav # original audio 
├──aud_eo.npy # audio features (wav2vec)
├──aud.npy # audio features (deepspeech)
├──bc.jpg # default background
├──track_params.pt # raw head tracking results
├──transforms_train.json # head poses (train split)
├──transforms_val.json # head poses (test split)

--task 1
分离音频 aud.wav​
===== extract audio from data/obama/Obama.mp4 to data/obama/aud.wav =====​
===== extracted audio =====​
--task 2​
生成aud_eo.npy​
===== extract audio labels for data/obama/aud.wav =====​
===== extracted audio labels =====​
--task 3​
把视频拆分成图像​
===== extract images from data/obama/Obama.mp4 to data/obama/ori_imgs =====​
===== extracted images =====​
--task 4​
===== save transforms =====​
===== finished saving transforms =====

实际执行过程:

ini 复制代码
[INFO] ===== extract audio from data/obama/Obama.mp4 to data/obama/aud.wav =====
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
  configuration: --prefix=/root/miniconda3/envs/vrh3.8 --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'data/obama/Obama.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    date            : 2021/06/24 23:54:51
    encoder         : Lavf58.29.100
  Duration: 00:05:20.00, start: 0.000000, bitrate: 635 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 450x450 [SAR 1:1 DAR 1:1], 480 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 149 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
File 'data/obama/aud.wav' already exists. Overwrite? [y/N] y'^H^H^H^H^H
Stream mapping:
  Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'data/obama/aud.wav':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    ICRD            : 2021/06/24 23:54:51
    ISFT            : Lavf58.45.100
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, stereo, s16, 512 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      encoder         : Lavc58.91.100 pcm_s16le
size=   19997kB time=00:05:19.95 bitrate= 512.0kbits/s speed=1.22e+03x    
video:0kB audio:19997kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000518%
[INFO] ===== extracted audio =====
[INFO] ===== extract audio labels for data/obama/aud.wav =====
Traceback (most recent call last):
  File "nerf/asr.py", line 5, in <module>
    from transformers import AutoModelForCTC, AutoProcessor
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/transformers/__init__.py", line 26, in <module>
    from . import dependency_versions_check
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/transformers/dependency_versions_check.py", line 16, in <module>
    from .utils.versions import require_version, require_version_core
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/transformers/utils/__init__.py", line 18, in <module>
    from huggingface_hub import get_full_repo_name  # for backward compatibility
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/huggingface_hub/__init__.py", line 379, in __getattr__
    submod = importlib.import_module(submod_path)
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 50, in <module>
    from ._commit_api import (
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/huggingface_hub/_commit_api.py", line 20, in <module>
    from .file_download import hf_hub_url
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 22, in <module>
    from filelock import FileLock
ModuleNotFoundError: No module named 'filelock'
[INFO] ===== extracted audio labels =====
[INFO] ===== extract images from data/obama/Obama.mp4 to data/obama/ori_imgs =====
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
  configuration: --prefix=/root/miniconda3/envs/vrh3.8 --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'data/obama/Obama.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    date            : 2021/06/24 23:54:51
    encoder         : Lavf58.29.100
  Duration: 00:05:20.00, start: 0.000000, bitrate: 635 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 450x450 [SAR 1:1 DAR 1:1], 480 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 149 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native))
Press [q] to stop, [?] for help
[swscaler @ 0x55e2d83ad7c0] deprecated pixel format used, make sure you did set range correctly
Output #0, image2, to 'data/obama/ori_imgs/%d.jpg':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    date            : 2021/06/24 23:54:51
    encoder         : Lavf58.45.100
    Stream #0:0(und): Video: mjpeg, yuvj420p(pc), 450x450 [SAR 1:1 DAR 1:1], q=1-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      encoder         : Lavc58.91.100 mjpeg
    Side data:
      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: N/A
frame= 8000 fps=1437 q=1.0 Lsize=N/A time=00:05:20.00 bitrate=N/A speed=57.5x    
video:264207kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[INFO] ===== extracted images =====
[INFO] ===== extract semantics from data/obama/ori_imgs to data/obama/parsing =====
[INFO] loading model...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8000/8000 [05:50<00:00, 22.83it/s]
[INFO] ===== extracted semantics =====
[INFO] ===== extract background image from data/obama/ori_imgs =====
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 400/400 [01:52<00:00,  3.54it/s]
[INFO] ===== extracted background image =====
[INFO] ===== extract torso and gt images for data/obama =====
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8000/8000 [05:13<00:00, 25.50it/s]
[INFO] ===== extracted torso and gt images =====
[INFO] ===== extract face landmarks from data/obama/ori_imgs =====
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8000/8000 [02:19<00:00, 57.48it/s]
[INFO] ===== extracted face landmarks =====
[INFO] ===== perform face tracking =====
[INFO] fitting focal length...
600 2.126291513442993 -3.927321195602417
700 2.1062731742858887 -4.606115818023682
800 2.0988805294036865 -5.333498954772949
900 2.085597276687622 -5.986014366149902
1000 2.0794363021850586 -6.6705193519592285
1100 2.0748462677001953 -7.362542152404785
1200 2.074572801589966 -8.087048530578613
1300 2.077897548675537 -8.850865364074707
1400 2.0787346363067627 -9.6022310256958
[INFO] find best focal: 1200
[INFO] coarse fitting...
2.075936794281006 -8.418652534484863
[INFO] fitting light...
[INFO] fine frame-wise fitting...
0 of 125 done
1 of 125 done
2 of 125 done
...
124 of 125 done
params saved
[INFO] ===== finished face tracking =====
[INFO] ===== save transforms =====
[INFO] ===== finished saving transforms =====

注意:

原视频很影响数字人呈现的效果(data/boama/parsing)。如果女生有头发披在身前的,可能会导致导致误判为背景,缺了一块身体。

图片分隔后,整体展示

(蓝色:人脸;绿色:脖子;红色:人身;白色:背景)

7、训练模型

css 复制代码
#1. 头部训练
python main.py data/obama/ --workspace trial_obama/ -O --iters 200000 #生成结果视频:trial_obama/results/ngp_ep0028.mp4​

#2. 唇部训练
python main.py data/obama/ --workspace trial_obama/ -O --iters 250000 --finetune_lips #生成结果视频:trial_obama/results/ngp_ep0035.mp4​

#3. 躯干训练 
## <head>.pth 应该使用最新的 checkpoint in trial_obama 
## 需要时间特别久(40分钟以上)
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --head_ckpt <head>.pth --iters 200000
例:python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --head_ckpt trial_obama/checkpoints/ngp_ep0035.pth --iters 200000​

# 在测试分裂上进行测试
python main.py data/obama/ --workspace trial_obama/ -O --test # 使用头部检查点,将装载GT躯干​
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test​

# GUI测试(需要window环境下)
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --gui​

# GUI测试 (负载语音识别模型的实时应用)(需要window环境下)
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --gui --asr

8、生成数字人视频​

bash 复制代码
# 使用特定的音频和姿势序列测试​
# --test_train: 使用训练拆分进行测试​
# --data_range: 使用此范围的姿势和眼睛序列(如果短于音频,自动镜像重复)​
# data/intro_eo.npy 文件看第9步获取,或者用其他音频文件生成​
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --test_train --data_range 0 100 --aud data/intro_eo.npy​

注意:

1、根据不同音频生成的数字人,只有嘴型的区别,基本能对上

2、如果模型视频长度短于音频长度,将重复模型视频

3、生成出来的视频,没有音频,需要自行合并音频到视频中

9、快速体验(可以跳过训练模型,快速体验生成数字人)

下载已训练好的模型:drive.google.com/drive/folde...

bash 复制代码
# 1. 生成音频文件npy​
# 如果模型是 `<ID>_eo.pth`, 使用以下方式生成​
python nerf/asr.py --wav data/aud.wav --save_feats # 生成的音频文件npy,保存到data目录下 data/<name>_eo.npy​
​
# 如果模型是 `<ID>.pth`, 使用以下方式生成​
python data_utils/deepspeech_features/extract_ds_features.py --input data/<name>.wav # save to data/<name>.npy​
​
# 2. 生成数字人​
## --pose 姿势序列文件 | --ckpt 预训练模型​
python test.py --pose data/demo/obama.json --ckpt data/demo/obama_eo.pth --aud data/intro_eo.npy --workspace data/demo/trial_obama/ -O --torso​
python test.py --pose data/demo/obama.json --ckpt data/demo/obama_eo.pth --aud data/aud_eo.npy --workspace data/demo/trial_obama/ -O --torso --bg_img data/demo/bg.jpg​
python test.py --pose data/demo/obama.json --ckpt data/demo/obama_eo.pth --aud data/intro_eo.npy --workspace data/demo/trial_obama/ -O --torso --bg_img data/demo/bg.jpg --gui #没执行成功,需要在window环境

报错解决

1、conda切换缓存报错

解决方法:

bash 复制代码
#1、初始化执行环境
conda init bash
#2、如果不生效,执行一下
source ~/.bashrc

2、执行报错:python data_utils/process.py data/obama/Obama.mp4

AssertionError: Torch not compiled with CUDA enabled

原因:pytorch3d安装的版本与pytorch不对应。

解决方法:

bash 复制代码
# 1.卸载pytorch3d扩展​
conda uninstall pytorch3d​

# 2.重新安装对应版本​
参考:环境搭建-步骤5

版本地址:github.com/facebookres...

3、执行报错:

ModuleNotFoundError: No module named 'sklearn​

解决方法:安装依赖

pip install -U scikit-learn​

4、执行报错:

raise AttributeError(name) from None AttributeError: _2D

解决方法:

ini 复制代码
data_utils\process.py 50行:​
​
fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=False)​
​
改为:​
​
fa = face_alignment.FaceAlignment(face_alignment.LandmarksType.TWO_D, flip_input=False)

参考:blog.csdn.net/qq_37160051...

5、执行报错:python data_utils/process.py data/obama/Obama.mp4 --task 2

OSError: We couldn't connect to 'huggingface.co' to load this file, couldn't find it in the cached files and it looks like cpierse/wav2vec2-large-xlsr-53-esperanto is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'huggingface.co/docs/transf...'.

解决方法: 手动下载模型 huggingface.co/cpierse/wav... (全部文件),放在RAD-NeRF目录下的 cpierse/wav2vec2-large-xlsr-53-esperanto/ 目录中(需要新创建目录)

6、执行报错:python data_utils/process.py data/p/p.mp4 --task 8

torch.cuda.OutOfMemoryError: CUDA out of memory.

解决方案:

ini 复制代码
data_utils\face_tracking\face_tracker.py 180行:

batch_size = 64

改为

batch_size = 24(实际GPU的内存大小)

参考:blog.csdn.net/Acmer_futur...

参考文档

  1. RAD-NeRF真人视频的三维重建数字人源码与训练方法:blog.csdn.net/matt45m/art...
  2. 使用RAD-NeRF训练数字人:www.bilibili.com/video/BV18w...
相关推荐
本当迷ya2 小时前
💖2025年不会Stream流被同事排挤了┭┮﹏┭┮(强烈建议实操)
后端·程序员
程序员小范1 天前
孙玲:从流水线工人到谷歌程序员
人工智能·程序员·谷歌·远程工作
程序员鱼皮1 天前
我发现很多程序员都不会打日志。。
计算机·程序员·开发·编程经验·java程序员
demo007x2 天前
「创意故事卡片创作助手」扣子模板使用教程
前端·后端·程序员
酷熊代理3 天前
网络安全:我们的安全防线
运维·网络·安全·web安全·网络安全·程序员
一只爱撸猫的程序猿3 天前
简单实现一个苹果支付的场景
spring boot·后端·程序员
豆包MarsCode3 天前
基于豆包MarsCode 和 Threejs 实现3D地图可视化
大数据·开发语言·人工智能·python·3d·程序员
狼叔3 天前
解读前端大牛TC39 成员Hax贺师俊:如何保持个人竞争力-浪说播客04
前端·程序员
京东云开发者4 天前
质量视角下的系统稳定性保障--稳定性保障常态化自动化实践
程序员