硬核,3000字全攻略! 小白都看得懂的数字人搭建流程: RAD-NeRF 应用篇

介绍

通过音频空间分解的实时神经辐射说话肖像合成。(所有文字都认识,但是拼在一起就是看不懂)

开源地址:github.com/ashawkey/RA...

项目页面:me.kiui.moe/radnerf/

实际效果:

搭建环境

使用云镜像(autodl):

1、拉取代码

bash 复制代码
#1. 拉取代码仓库
git clone https://github.com/ashawkey/RAD-NeRF.git
#2. 进入项目目录
cd RAD-NeRF 

2、创建虚拟环境

ini 复制代码
#1. 创建虚拟环境
conda create --name jmaat666 python=3.10
#2. 切换环境
conda activate jmaat666

3、安装cuda依赖

r 复制代码
#pytorch 要单独对应cuda进行安装,要不然训练时使用不了GPU
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia
conda install -c fvcore -c iopath -c conda-forge fvcore iopath

查看CUDA版本

bash 复制代码
1. NVIDIA提供了一个命令行工具nvidia-smi,它可以显示当前系统中安装的NVIDIA驱动程序和CUDA版本信息。
nvidia-smi

2. nvcc是CUDA的编译器驱动程序,它也可以用来查看CUDA版本​
nvcc --version

3. 查看CUDA安装目录:
ls /usr/local/cuda 或 ls /opt/cuda​

# 测试cuda依赖是否成功安装,testCUDA.py
import torch

print(torch.cuda.is_available())  #return true
print(torch.version.cuda)  #cuda version

按照CUDA对应的版本安装:pytorch.org/get-started...

4、安装依赖

bash 复制代码
#安装所需要的依赖,在RAD-NeRF目录下
pip install -r requirements.txt

5、安装pytorch3d扩展

bash 复制代码
# 1. 安装 pytorch3d
直接安装
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install "git+https://github.com/facebookresearch/pytorch3d.git@v0.7.3"

或者(推荐使用)

a.先拉取仓库地址(可以退到上一级目录操作)
git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d
b.查看对应的pytorch版本:https://github.com/facebookresearch/pytorch3d?tab=readme-ov-file#news
c.切换版本,并安装
git checkout v0.7.3
pip install . 
bash 复制代码
# 2. 从AD-NeRF仓库,获取模型(可以退到上一级目录操作)
推荐方法:
a. 拉取仓库代码
git clone https://github.com/YudongGuo/AD-NeRF.git
b. 拷贝模型和测试视频
cp AD-NeRF/data_util/face_parsing/79999_iter.pth RAD-NeRF/data_utils/face_parsing/
cp -R AD-NeRF/data_util/face_tracking/3DMM RAD-NeRF/data_utils/face_tracking/
mkdir -p RAD-NeRF/data/obama/
cp AD-NeRF/dataset/vids/Obama.mp4 RAD-NeRF/data/obama/

或者

## 准备人脸解析模型
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth
## 准备basel脸部模型
从 https://faces.dmi.unibas.ch/bfm/main.php?nav=1-2&id=downloads 下载01_MorphableModel.mat放到Rad-NeRF/data_utils/face_trackong/3DMM里面。
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/exp_info.npy?raw=true -O data_utils/face_tracking/3DMM/exp_info.npy
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/keys_info.npy?raw=true -O data_utils/face_tracking/3DMM/keys_info.npy
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/sub_mesh.obj?raw=true -O data_utils/face_tracking/3DMM/sub_mesh.obj
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/topology_info.npy?raw=true -O data_utils/face_tracking/3DMM/topology_info.npy
mkdir -p data/obama/
wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/obama/obama.mp4
shell 复制代码
# 3. 下载人脸模型,并把人脸模型 01_MorphableModel.mat 放在 data_utils/face_tracking/3DMM/ 目录下

最后该目录下的文件如下:

下载人脸模型地址:faces.dmi.unibas.ch/bfm/main.ph...

下载目录如下:

bash 复制代码
# 4. 执行convert_BFM.py文件
cd RAD-NeRF/data_utils/face_tracking
python convert_BFM.py
# 5. 返回原目录​
cd ../..

6、视频预处理

bash 复制代码
#视频预处理
python data_utils/process.py data/obama/Obama.mp4

#或者 分步处理,一共有九个步骤

python data_utils/process.py data/obama/Obama.mp4 --task 1 分离音频
...

执行过程中,会下载四个模型,如果没有魔法上网,这四个模型下载很慢,或者直接下到一半就崩掉了。

模型地址:

download.pytorch.org/models/resn...

www.adrianbulat.com/downloads/p...

www.adrianbulat.com/downloads/p...

download.pytorch.org/models/alex...

可以事先下载好4个模型,到指定的目录:

Linux:/root/.cache/torch/hub/checkpoints/

Window:C:\用户\用户名(xx).cache\torch\hub\checkpoints

执行过程:

ini 复制代码
./data/<ID>
├──<ID>.mp4 # original video
├──ori_imgs # original images from video
│  ├──0.jpg
│  ├──0.lms # 2D landmarks
│  ├──...
├──gt_imgs # ground truth images (static background)
│  ├──0.jpg
│  ├──...
├──parsing # semantic segmentation
│  ├──0.png
│  ├──...
├──torso_imgs # inpainted torso images
│  ├──0.png
│  ├──...
├──aud.wav # original audio 
├──aud_eo.npy # audio features (wav2vec)
├──aud.npy # audio features (deepspeech)
├──bc.jpg # default background
├──track_params.pt # raw head tracking results
├──transforms_train.json # head poses (train split)
├──transforms_val.json # head poses (test split)

--task 1
分离音频 aud.wav​
===== extract audio from data/obama/Obama.mp4 to data/obama/aud.wav =====​
===== extracted audio =====​
--task 2​
生成aud_eo.npy​
===== extract audio labels for data/obama/aud.wav =====​
===== extracted audio labels =====​
--task 3​
把视频拆分成图像​
===== extract images from data/obama/Obama.mp4 to data/obama/ori_imgs =====​
===== extracted images =====​
--task 4​
===== save transforms =====​
===== finished saving transforms =====

实际执行过程:

ini 复制代码
[INFO] ===== extract audio from data/obama/Obama.mp4 to data/obama/aud.wav =====
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
  configuration: --prefix=/root/miniconda3/envs/vrh3.8 --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'data/obama/Obama.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    date            : 2021/06/24 23:54:51
    encoder         : Lavf58.29.100
  Duration: 00:05:20.00, start: 0.000000, bitrate: 635 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 450x450 [SAR 1:1 DAR 1:1], 480 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 149 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
File 'data/obama/aud.wav' already exists. Overwrite? [y/N] y'^H^H^H^H^H
Stream mapping:
  Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'data/obama/aud.wav':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    ICRD            : 2021/06/24 23:54:51
    ISFT            : Lavf58.45.100
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, stereo, s16, 512 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      encoder         : Lavc58.91.100 pcm_s16le
size=   19997kB time=00:05:19.95 bitrate= 512.0kbits/s speed=1.22e+03x    
video:0kB audio:19997kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000518%
[INFO] ===== extracted audio =====
[INFO] ===== extract audio labels for data/obama/aud.wav =====
Traceback (most recent call last):
  File "nerf/asr.py", line 5, in <module>
    from transformers import AutoModelForCTC, AutoProcessor
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/transformers/__init__.py", line 26, in <module>
    from . import dependency_versions_check
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/transformers/dependency_versions_check.py", line 16, in <module>
    from .utils.versions import require_version, require_version_core
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/transformers/utils/__init__.py", line 18, in <module>
    from huggingface_hub import get_full_repo_name  # for backward compatibility
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/huggingface_hub/__init__.py", line 379, in __getattr__
    submod = importlib.import_module(submod_path)
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/huggingface_hub/hf_api.py", line 50, in <module>
    from ._commit_api import (
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/huggingface_hub/_commit_api.py", line 20, in <module>
    from .file_download import hf_hub_url
  File "/root/miniconda3/envs/vrh3.8/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 22, in <module>
    from filelock import FileLock
ModuleNotFoundError: No module named 'filelock'
[INFO] ===== extracted audio labels =====
[INFO] ===== extract images from data/obama/Obama.mp4 to data/obama/ori_imgs =====
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
  configuration: --prefix=/root/miniconda3/envs/vrh3.8 --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'data/obama/Obama.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    date            : 2021/06/24 23:54:51
    encoder         : Lavf58.29.100
  Duration: 00:05:20.00, start: 0.000000, bitrate: 635 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 450x450 [SAR 1:1 DAR 1:1], 480 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 149 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native))
Press [q] to stop, [?] for help
[swscaler @ 0x55e2d83ad7c0] deprecated pixel format used, make sure you did set range correctly
Output #0, image2, to 'data/obama/ori_imgs/%d.jpg':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    date            : 2021/06/24 23:54:51
    encoder         : Lavf58.45.100
    Stream #0:0(und): Video: mjpeg, yuvj420p(pc), 450x450 [SAR 1:1 DAR 1:1], q=1-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      encoder         : Lavc58.91.100 mjpeg
    Side data:
      cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: N/A
frame= 8000 fps=1437 q=1.0 Lsize=N/A time=00:05:20.00 bitrate=N/A speed=57.5x    
video:264207kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[INFO] ===== extracted images =====
[INFO] ===== extract semantics from data/obama/ori_imgs to data/obama/parsing =====
[INFO] loading model...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8000/8000 [05:50<00:00, 22.83it/s]
[INFO] ===== extracted semantics =====
[INFO] ===== extract background image from data/obama/ori_imgs =====
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 400/400 [01:52<00:00,  3.54it/s]
[INFO] ===== extracted background image =====
[INFO] ===== extract torso and gt images for data/obama =====
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8000/8000 [05:13<00:00, 25.50it/s]
[INFO] ===== extracted torso and gt images =====
[INFO] ===== extract face landmarks from data/obama/ori_imgs =====
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8000/8000 [02:19<00:00, 57.48it/s]
[INFO] ===== extracted face landmarks =====
[INFO] ===== perform face tracking =====
[INFO] fitting focal length...
600 2.126291513442993 -3.927321195602417
700 2.1062731742858887 -4.606115818023682
800 2.0988805294036865 -5.333498954772949
900 2.085597276687622 -5.986014366149902
1000 2.0794363021850586 -6.6705193519592285
1100 2.0748462677001953 -7.362542152404785
1200 2.074572801589966 -8.087048530578613
1300 2.077897548675537 -8.850865364074707
1400 2.0787346363067627 -9.6022310256958
[INFO] find best focal: 1200
[INFO] coarse fitting...
2.075936794281006 -8.418652534484863
[INFO] fitting light...
[INFO] fine frame-wise fitting...
0 of 125 done
1 of 125 done
2 of 125 done
...
124 of 125 done
params saved
[INFO] ===== finished face tracking =====
[INFO] ===== save transforms =====
[INFO] ===== finished saving transforms =====

注意:

原视频很影响数字人呈现的效果(data/boama/parsing)。如果女生有头发披在身前的,可能会导致导致误判为背景,缺了一块身体。

图片分隔后,整体展示

(蓝色:人脸;绿色:脖子;红色:人身;白色:背景)

7、训练模型

css 复制代码
#1. 头部训练
python main.py data/obama/ --workspace trial_obama/ -O --iters 200000 #生成结果视频:trial_obama/results/ngp_ep0028.mp4​

#2. 唇部训练
python main.py data/obama/ --workspace trial_obama/ -O --iters 250000 --finetune_lips #生成结果视频:trial_obama/results/ngp_ep0035.mp4​

#3. 躯干训练 
## <head>.pth 应该使用最新的 checkpoint in trial_obama 
## 需要时间特别久(40分钟以上)
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --head_ckpt <head>.pth --iters 200000
例:python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --head_ckpt trial_obama/checkpoints/ngp_ep0035.pth --iters 200000​

# 在测试分裂上进行测试
python main.py data/obama/ --workspace trial_obama/ -O --test # 使用头部检查点,将装载GT躯干​
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test​

# GUI测试(需要window环境下)
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --gui​

# GUI测试 (负载语音识别模型的实时应用)(需要window环境下)
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --gui --asr

8、生成数字人视频​

bash 复制代码
# 使用特定的音频和姿势序列测试​
# --test_train: 使用训练拆分进行测试​
# --data_range: 使用此范围的姿势和眼睛序列(如果短于音频,自动镜像重复)​
# data/intro_eo.npy 文件看第9步获取,或者用其他音频文件生成​
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --test_train --data_range 0 100 --aud data/intro_eo.npy​

注意:

1、根据不同音频生成的数字人,只有嘴型的区别,基本能对上

2、如果模型视频长度短于音频长度,将重复模型视频

3、生成出来的视频,没有音频,需要自行合并音频到视频中

9、快速体验(可以跳过训练模型,快速体验生成数字人)

下载已训练好的模型:drive.google.com/drive/folde...

bash 复制代码
# 1. 生成音频文件npy​
# 如果模型是 `<ID>_eo.pth`, 使用以下方式生成​
python nerf/asr.py --wav data/aud.wav --save_feats # 生成的音频文件npy,保存到data目录下 data/<name>_eo.npy​
​
# 如果模型是 `<ID>.pth`, 使用以下方式生成​
python data_utils/deepspeech_features/extract_ds_features.py --input data/<name>.wav # save to data/<name>.npy​
​
# 2. 生成数字人​
## --pose 姿势序列文件 | --ckpt 预训练模型​
python test.py --pose data/demo/obama.json --ckpt data/demo/obama_eo.pth --aud data/intro_eo.npy --workspace data/demo/trial_obama/ -O --torso​
python test.py --pose data/demo/obama.json --ckpt data/demo/obama_eo.pth --aud data/aud_eo.npy --workspace data/demo/trial_obama/ -O --torso --bg_img data/demo/bg.jpg​
python test.py --pose data/demo/obama.json --ckpt data/demo/obama_eo.pth --aud data/intro_eo.npy --workspace data/demo/trial_obama/ -O --torso --bg_img data/demo/bg.jpg --gui #没执行成功,需要在window环境

报错解决

1、conda切换缓存报错

解决方法:

bash 复制代码
#1、初始化执行环境
conda init bash
#2、如果不生效,执行一下
source ~/.bashrc

2、执行报错:python data_utils/process.py data/obama/Obama.mp4

AssertionError: Torch not compiled with CUDA enabled

原因:pytorch3d安装的版本与pytorch不对应。

解决方法:

bash 复制代码
# 1.卸载pytorch3d扩展​
conda uninstall pytorch3d​

# 2.重新安装对应版本​
参考:环境搭建-步骤5

版本地址:github.com/facebookres...

3、执行报错:

ModuleNotFoundError: No module named 'sklearn​

解决方法:安装依赖

pip install -U scikit-learn​

4、执行报错:

raise AttributeError(name) from None AttributeError: _2D

解决方法:

ini 复制代码
data_utils\process.py 50行:​
​
fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=False)​
​
改为:​
​
fa = face_alignment.FaceAlignment(face_alignment.LandmarksType.TWO_D, flip_input=False)

参考:blog.csdn.net/qq_37160051...

5、执行报错:python data_utils/process.py data/obama/Obama.mp4 --task 2

OSError: We couldn't connect to 'huggingface.co' to load this file, couldn't find it in the cached files and it looks like cpierse/wav2vec2-large-xlsr-53-esperanto is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'huggingface.co/docs/transf...'.

解决方法: 手动下载模型 huggingface.co/cpierse/wav... (全部文件),放在RAD-NeRF目录下的 cpierse/wav2vec2-large-xlsr-53-esperanto/ 目录中(需要新创建目录)

6、执行报错:python data_utils/process.py data/p/p.mp4 --task 8

torch.cuda.OutOfMemoryError: CUDA out of memory.

解决方案:

ini 复制代码
data_utils\face_tracking\face_tracker.py 180行:

batch_size = 64

改为

batch_size = 24(实际GPU的内存大小)

参考:blog.csdn.net/Acmer_futur...

参考文档

  1. RAD-NeRF真人视频的三维重建数字人源码与训练方法:blog.csdn.net/matt45m/art...
  2. 使用RAD-NeRF训练数字人:www.bilibili.com/video/BV18w...
相关推荐
我是陈泽20 小时前
一行 Python 代码能实现什么丧心病狂的功能?圣诞树源代码
开发语言·python·程序员·编程·python教程·python学习·python教学
肖哥弹架构2 天前
Spring 全家桶使用教程
java·后端·程序员
IT杨秀才5 天前
自己动手写了一个协程池
后端·程序员·go
程序员麻辣烫7 天前
像AI一样思考
程序员
一颗苹果OMG8 天前
关于进游戏公司实习的第一周
前端·程序员
万少8 天前
你会了吗 HarmonyOS Next 项目级别的注释规范
前端·程序员·harmonyos
楽码9 天前
彻底理解时间?在编程中使用原子钟
后端·算法·程序员
江南一点雨10 天前
又一家培训机构即将倒闭!打工人讨薪无果,想报名的小伙伴擦亮眼睛~
java·程序员
用户861782773651810 天前
ELK 搭建 & 日志集成
java·后端·程序员
河北小田10 天前
局部变量成员变量、引用类型、this、static
java·后端·程序员