移动端数字人 Ultralight-Digital-Human 算法笔记

AI视觉网奇2025-12-10 16:52

[Ultralight-Digital-Human 数字人算法](#Ultralight-Digital-Human 数字人算法)

[MNN TaoAvatar 无网手机运行数字人开源](#MNN TaoAvatar 无网手机运行数字人开源)

[metahuman-stream 改名为livetalking](#metahuman-stream 改名为livetalking)

参考资料

人脸检测（SCRFD）：Sample and Computation Redistribution for Efficient Face Detection

模型：scrfd_2.5g_kps.onnx

人脸关键点检测（PFLD）：A Practical Facial Landmark Detector

模型：checkpoint_epoch_335.pth.tar （实际输出了110个坐标点）

音频特征提取，两种方案

HuBert：

WeNet：

首先我们需要提取音频特征，我用了两个不同的特征提取起，分别是wenet和hubert，感谢他们。

When you using wenet, you neet to ensure that your video frame rate is 20, and for hubert,your video frame rate should be 25.

如果你选择使用wenet的话，你必须保证你视频的帧率是20fps，如果选择hubert，视频帧率必须是25fps。

In my experiments, hubert performs better, but wenet is faster and can run in real time on mobile devices.

在我的实验中，hubert的效果更好，但是wenet速度更快，可以在移动端上实时运行

And other steps are in data_utils/process.py, you just run it like this.

其他步骤都写在data_utils/process.py里面了，没什么特别要注意的。

复制代码

cd data_utils
python process.py YOUR_VIDEO_PATH --asr hubert

为避免与3d数字人混淆，原项目metahuman-stream改名为livetalking，原有链接地址继续可用