纯 C# 实现数字人口型同步:Wav2Lip + YOLOv8 + GFPGAN 完整落地

目录

说明

效果

核心功能

技术方案

[1. YOLOv8 Face Landmarks](#1. YOLOv8 Face Landmarks)

[2. C# 实现 Mel 频谱](# 实现 Mel 频谱)

[3. Wav2Lip ONNX 推理](#3. Wav2Lip ONNX 推理)

[4. 人脸融合与边缘羽化](#4. 人脸融合与边缘羽化)

[5. GFPGAN 人脸修复](#5. GFPGAN 人脸修复)

[6. FFmpeg 音视频合并](#6. FFmpeg 音视频合并)

项目特点

应用场景

wav2lip.onnx模型信息

C#代码

下载


说明

只需要一张正脸照片和一段语音,就能生成自然开口说话的视频。

这次分享的是一套基于 C# WinForms 开发的数字人口型同步 Demo。项目不依赖 Python,集成人脸检测、Mel 频谱提取、Wav2Lip 推理、人脸修复、视频生成及音视频合并,可直接在 Windows 环境运行。

效果

C# OnnxRuntime Wav2Lip

核心功能

项目打通了完整的数字人口型同步流程:

  • 选择一张人物正脸图片

  • 导入 PCM WAV 音频

  • 自动检测并裁剪人脸区域

  • 根据音频逐帧生成匹配的嘴型

  • 将生成人脸融合回原图

  • 可选 GFPGAN 人脸修复

  • 生成 AVI 视频

  • 调用 FFmpeg 合并音视频

  • 使用 FFplay 一键预览结果

整个流程通过图形界面完成,无需编写命令,也无需配置 Python 环境。

技术方案

1. YOLOv8 Face Landmarks

首先使用 YOLOv8 Face Landmarks 检测图片中的人脸。

程序会自动选择面积最大的人脸,并根据检测框截取适合 Wav2Lip 推理的区域,解决了手动裁图不准确导致的人脸变形、脖子伪影等问题。

2. C# 实现 Mel 频谱

项目直接使用 C# 读取 PCM WAV,并完成:

  • 单声道转换

  • 16 kHz 重采样

  • 预加重处理

  • STFT 频谱计算

  • 80 维 Mel 滤波

  • Wav2Lip 标准归一化

  • 80×16 Mel 窗口切分

音频按照 25 FPS 滑动取样,为每一帧生成对应的模型输入。

3. Wav2Lip ONNX 推理

推理部分使用 Microsoft ONNX Runtime,支持普通 Wav2Lip 与 Wav2Lip GAN 模型。

程序优先加载画质更好的 wav2lip_gan.onnx,输入和输出分别为:

  • Mel 输入:[1, 1, 80, 16]

  • 人脸输入:[1, 6, 96, 96]

  • 预测结果:[1, 3, 96, 96]

模型生成的嘴型会随着音频内容逐帧变化。

4. 人脸融合与边缘羽化

生成结果不会直接以96×96的小图片输出,而是重新缩放并贴回原图中的人脸位置。

针对下巴和脖子处容易出现的矩形接缝,项目加入了四边渐隐融合,并扩大底部羽化范围,使生成区域能够平滑过渡回原图。

5. GFPGAN 人脸修复

对于清晰度不足的结果,可以勾选 GFPGAN 修复。

GFPGAN 能够改善面部纹理和细节,但逐帧执行512×512模型会增加处理时间,也可能带来轻微的帧间变化,因此项目将其设计为可选功能。

6. FFmpeg 音视频合并

Wav2Lip 推理完成后,程序先生成无声 AVI,再通过 C# 调用 FFmpeg 命令,将原始 WAV 与视频合成为带声音的 MP4。

生成完成后,还可以直接调用 FFplay 进行预览。

项目特点

  • 纯 C# 实现,不依赖 Python

  • 基于 WinForms,操作简单

  • ONNX Runtime CPU 推理

  • 自动人脸检测,无需手工裁图

  • 支持 Wav2Lip GAN

  • 支持 GFPGAN 修复

  • 支持取消长时间任务

  • 自动显示处理进度

  • 支持音视频合并与一键预览

  • 各功能按类拆分,方便二次开发

应用场景

这套方案可以用于:

  • 数字人播报

  • 虚拟主播

  • 短视频配音

  • 教学内容制作

  • 企业宣传视频

  • 游戏角色对话

  • AI 数字员工原型验证

wav2lip.onnx模型信息

复制代码
Model Properties
-------------------------
---------------------------------------------------------------

Inputs
-------------------------
name:mel_spectrogram
tensor:Float[-1, 1, 80, 16]
name:video_frames
tensor:Float[-1, 6, 96, 96]
---------------------------------------------------------------

Outputs
-------------------------
name:predicted_frames
tensor:Float[-1, 3, 96, 96]
---------------------------------------------------------------

C#代码

复制代码
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
using OpenCvSharp;
using OpenCvSharp.Dnn;
using System;
using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Drawing2D;
using System.Drawing.Imaging;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
using System.Windows.Forms;

namespace Onnx_Demo
{

    public partial class Form1 : Form
    {
        private string imagePath;
        private string wavPath;
        private string gfpganModelPath;
        private Wav2LipInference wav2Lip;
        private YoloFaceDetector faceDetector;
        private GfpganEnhancer gfpgan;
        private bool isGenerating;
        private bool cancelRequested;
        private string lastSilentVideoPath;
        private string lastMergedVideoPath;
        private const double VideoFps = 25.0;

        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            try
            {
                string ganPath = Path.Combine(Application.StartupPath, "model", "wav2lip_gan.onnx");
                string wav2LipPath = File.Exists(ganPath)
                    ? ganPath
                    : Path.Combine(Application.StartupPath, "model", "wav2lip.onnx");
                string detectorPath = Path.Combine(Application.StartupPath, "model", "yolov8-face-landmarks.onnx");
                gfpganModelPath = Path.Combine(Application.StartupPath, "model", "GFPGANv1.4.onnx");

                if (!File.Exists(wav2LipPath)) throw new FileNotFoundException("未找到 Wav2Lip 模型。", wav2LipPath);
                if (!File.Exists(detectorPath)) throw new FileNotFoundException("未找到人脸检测模型。", detectorPath);

                using (var options = new SessionOptions())
                {
                    options.LogSeverityLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_WARNING;
                    options.AppendExecutionProvider_CPU(0);
                    wav2Lip = new Wav2LipInference(wav2LipPath, options);
                }
                faceDetector = new YoloFaceDetector(detectorPath);
                checkGfpgan.Enabled = File.Exists(gfpganModelPath);
                textBox1.Text = "已加载 " + Path.GetFileName(wav2LipPath) +
                    " 和 YOLOv8 Face Landmarks。GFPGAN 可按需启用。";

                LoadBundledSamples();
                string sampleVideo = Path.Combine(Application.StartupPath, "test_data", "wav2lip_result.avi");
                if (File.Exists(sampleVideo)) lastSilentVideoPath = sampleVideo;
            }
            catch (Exception ex)
            {
                button2.Enabled = false;
                MessageBox.Show(ex.Message, "模型加载失败", MessageBoxButtons.OK, MessageBoxIcon.Error);
            }
        }

        private void LoadBundledSamples()
        {
            string sampleImage = Path.Combine(Application.StartupPath, "test_data", "front_face.png");
            string sampleAudio = Path.Combine(Application.StartupPath, "test_data", "speech_16k_pcm.wav");
            if (File.Exists(sampleImage))
            {
                imagePath = sampleImage;
                ReplacePicture(pictureBox1, new Bitmap(sampleImage));
                labelImage.Text = Path.GetFileName(sampleImage);
            }
            if (File.Exists(sampleAudio))
            {
                wavPath = sampleAudio;
                labelAudio.Text = Path.GetFileName(sampleAudio);
            }
        }

        private void button1_Click(object sender, EventArgs e)
        {
            using (var dialog = new OpenFileDialog())
            {
                dialog.Filter = "图片|*.bmp;*.jpg;*.jpeg;*.png;*.tif;*.tiff";
                if (dialog.ShowDialog() != DialogResult.OK) return;
                imagePath = dialog.FileName;
                ReplacePicture(pictureBox1, new Bitmap(imagePath));
                labelImage.Text = Path.GetFileName(imagePath);
            }
        }

        private void buttonAudio_Click(object sender, EventArgs e)
        {
            using (var dialog = new OpenFileDialog())
            {
                dialog.Filter = "PCM WAV 音频|*.wav";
                if (dialog.ShowDialog() != DialogResult.OK) return;
                wavPath = dialog.FileName;
                labelAudio.Text = Path.GetFileName(wavPath);
            }
        }

        private void button2_Click(object sender, EventArgs e)
        {
            if (isGenerating)
            {
                cancelRequested = true;
                button2.Enabled = false;
                button2.Text = "正在取消...";
                return;
            }
            if (wav2Lip == null || faceDetector == null || string.IsNullOrEmpty(imagePath) || string.IsNullOrEmpty(wavPath))
            {
                MessageBox.Show("请先选择图片和 WAV 音频。", "缺少输入");
                return;
            }

            string videoPath;
            using (var dialog = new SaveFileDialog())
            {
                dialog.Title = "保存 Wav2Lip 视频";
                dialog.Filter = "AVI 视频|*.avi";
                dialog.DefaultExt = "avi";
                dialog.FileName = "wav2lip_result.avi";
                if (dialog.ShowDialog() != DialogResult.OK) return;
                videoPath = dialog.FileName;
            }

            isGenerating = true;
            cancelRequested = false;
            button2.Text = "取消";
            Cursor = Cursors.WaitCursor;
            Application.DoEvents();

            try
            {
                DateTime started = DateTime.Now;
                textBox1.Text = "正在检测人脸并提取 Mel 频谱...";
                Application.DoEvents();

                float[,] fullMel = WavMelProcessor.LoadMelSpectrogram(wavPath);
                var starts = WavMelProcessor.GetWindowStartColumns(fullMel, VideoFps);
                if (checkGfpgan.Checked && gfpgan == null) gfpgan = new GfpganEnhancer(gfpganModelPath);

                using (var original = new Bitmap(imagePath))
                using (Bitmap source = EnsureEvenSize(original))
                using (Mat sourceMat = BitmapToMat(source))
                {
                    Rect faceRect = faceDetector.DetectLargest(sourceMat);
                    using (Bitmap detectedFace = CropBitmap(source, faceRect))
                    using (Bitmap face96 = ResizeBitmap(detectedFace, 96, 96))
                    using (var writer = new VideoWriter(videoPath, FourCC.MJPG, VideoFps,
                        new OpenCvSharp.Size(source.Width, source.Height)))
                    {
                        if (!writer.IsOpened()) throw new InvalidOperationException("无法创建 AVI 视频文件。 ");
                        for (int i = 0; i < starts.Count; i++)
                        {
                            if (cancelRequested) break;
                            float[,] melWindow = WavMelProcessor.GetMelWindow(fullMel, starts[i]);
                            using (Bitmap generated = wav2Lip.Run(face96, melWindow))
                            {
                                Bitmap restored = null;
                                try
                                {
                                    Bitmap faceOutput = generated;
                                    if (checkGfpgan.Checked)
                                    {
                                        restored = gfpgan.Run(generated);
                                        faceOutput = restored;
                                    }
                                    using (Bitmap composed = ComposeFrame(source, faceOutput, faceRect))
                                    using (Mat frame = BitmapToMat(composed))
                                    {
                                        writer.Write(frame);
                                        ReplacePicture(pictureBox2, new Bitmap(composed));
                                    }
                                }
                                finally
                                {
                                    if (restored != null) restored.Dispose();
                                }
                            }

                            textBox1.Text = string.Format(
                                "正在生成:{0}/{1} 帧({2:F1}%){3}\r\n人脸框:({4},{5},{6},{7})\r\n输出:{8}",
                                i + 1, starts.Count, 100.0 * (i + 1) / starts.Count,
                                checkGfpgan.Checked ? ",GFPGAN 已启用" : string.Empty,
                                faceRect.X, faceRect.Y, faceRect.Width, faceRect.Height, videoPath);
                            Application.DoEvents();
                        }
                    }
                }

                textBox1.Text = cancelRequested
                    ? "已取消;已生成的部分保存在:" + videoPath
                    : string.Format("生成完成:{0} 帧,{1:F1} 秒;耗时 {2:F1} 秒。\r\n输出:{3}\r\nAVI 当前不含音轨。",
                        starts.Count, starts.Count / VideoFps, (DateTime.Now - started).TotalSeconds, videoPath);
                lastSilentVideoPath = videoPath;
                lastMergedVideoPath = null;
            }
            catch (Exception ex)
            {
                textBox1.Text = "生成失败:" + ex.Message;
                MessageBox.Show(ex.ToString(), "生成失败", MessageBoxButtons.OK, MessageBoxIcon.Error);
            }
            finally
            {
                Cursor = Cursors.Default;
                isGenerating = false;
                cancelRequested = false;
                button2.Enabled = true;
                button2.Text = "生成视频";
            }
        }

        private void buttonMerge_Click(object sender, EventArgs e)
        {
            if (string.IsNullOrEmpty(lastSilentVideoPath) || !File.Exists(lastSilentVideoPath))
            {
                MessageBox.Show("请先生成无声 AVI 视频。", "缺少视频");
                return;
            }
            if (string.IsNullOrEmpty(wavPath) || !File.Exists(wavPath))
            {
                MessageBox.Show("请先选择 WAV 音频。", "缺少音频");
                return;
            }

            string outputPath;
            using (var dialog = new SaveFileDialog())
            {
                dialog.Title = "保存带音轨视频";
                dialog.Filter = "MP4 视频|*.mp4";
                dialog.DefaultExt = "mp4";
                dialog.FileName = "wav2lip_result_with_audio.mp4";
                if (dialog.ShowDialog() != DialogResult.OK) return;
                outputPath = dialog.FileName;
            }

            try
            {
                string ffmpeg = GetFfmpegTool("ffmpeg.exe");
                string arguments = "-y -i " + Quote(lastSilentVideoPath) + " -i " + Quote(wavPath) +
                    " -c:v mpeg4 -q:v 2 -c:a aac -b:a 192k -shortest " + Quote(outputPath);
                textBox1.Text = "正在通过 FFmpeg 合并音视频...";
                Application.DoEvents();
                string error = RunCommand(ffmpeg, arguments, true);
                if (!File.Exists(outputPath)) throw new InvalidOperationException("FFmpeg 未生成输出文件。\r\n" + error);
                lastMergedVideoPath = outputPath;
                textBox1.Text = "音视频合并完成:" + outputPath;
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.Message, "合并失败", MessageBoxButtons.OK, MessageBoxIcon.Error);
                textBox1.Text = "合并失败:" + ex.Message;
            }
        }

        private void buttonPreview_Click(object sender, EventArgs e)
        {
            string video = !string.IsNullOrEmpty(lastMergedVideoPath) && File.Exists(lastMergedVideoPath)
                ? lastMergedVideoPath : lastSilentVideoPath;
            if (string.IsNullOrEmpty(video) || !File.Exists(video))
            {
                MessageBox.Show("当前没有可预览的视频。", "预览");
                return;
            }
            try
            {
                string ffplay = GetFfmpegTool("ffplay.exe");
                var startInfo = new ProcessStartInfo
                {
                    FileName = ffplay,
                    Arguments = "-autoexit -window_title \"Wav2Lip Preview\" " + Quote(video),
                    UseShellExecute = false,
                    CreateNoWindow = false,
                    WorkingDirectory = Path.GetDirectoryName(ffplay)
                };
                Process.Start(startInfo);
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.Message, "预览失败", MessageBoxButtons.OK, MessageBoxIcon.Error);
            }
        }

        private static string GetFfmpegTool(string fileName)
        {
            string path = Path.Combine(Application.StartupPath, "ffmpeg", fileName);
            if (!File.Exists(path)) throw new FileNotFoundException("未找到 FFmpeg 工具。", path);
            return path;
        }

        private static string Quote(string value)
        {
            return"\"" + value.Replace("\"", "\\\"") + "\"";
        }

        private static string RunCommand(string executable, string arguments, bool hidden)
        {
            var startInfo = new ProcessStartInfo
            {
                FileName = executable,
                Arguments = arguments,
                UseShellExecute = false,
                RedirectStandardError = true,
                RedirectStandardOutput = false,
                CreateNoWindow = hidden,
                WorkingDirectory = Path.GetDirectoryName(executable)
            };
            using (Process process = Process.Start(startInfo))
            {
                string error = process.StandardError.ReadToEnd();
                process.WaitForExit();
                if (process.ExitCode != 0)
                    throw new InvalidOperationException("FFmpeg 退出码:" + process.ExitCode + "\r\n" + error);
                return error;
            }
        }

        private static unsafe Bitmap ComposeFrame(Bitmap source, Bitmap face, Rect target)
        {
            // Wav2Lip/GFPGAN 会轻微改变整个人脸框的颜色。直接矩形覆盖时,底边通常
            // 落在下巴或脖子上,接缝会很明显。这里使用四边平滑羽化,底边使用更宽
            // 的过渡区,使生成脸逐渐融合回原图。
            var result = new Bitmap(source.Width, source.Height, PixelFormat.Format24bppRgb);
            using (Graphics graphics = Graphics.FromImage(result))
            {
                graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;
                graphics.DrawImageUnscaled(source, 0, 0);
            }

            using (Bitmap resizedFace = ResizeBitmap(face, target.Width, target.Height))
            {
                Rectangle resultArea = new Rectangle(0, 0, result.Width, result.Height);
                Rectangle faceArea = new Rectangle(0, 0, resizedFace.Width, resizedFace.Height);
                BitmapData resultData = result.LockBits(resultArea, ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
                BitmapData faceData = resizedFace.LockBits(faceArea, ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);
                try
                {
                    int sideFeather = Math.Max(4, (int)(target.Width * 0.08));
                    int topFeather = Math.Max(4, (int)(target.Height * 0.08));
                    int bottomFeather = Math.Max(8, (int)(target.Height * 0.22));

                    for (int y = 0; y < target.Height; y++)
                    {
                        byte* destinationRow = (byte*)resultData.Scan0 + (target.Y + y) * resultData.Stride + target.X * 3;
                        byte* faceRow = (byte*)faceData.Scan0 + y * faceData.Stride;
                        float vertical = Math.Min(EdgeWeight(y, topFeather),
                            EdgeWeight(target.Height - 1 - y, bottomFeather));
                        for (int x = 0; x < target.Width; x++)
                        {
                            float horizontal = Math.Min(EdgeWeight(x, sideFeather),
                                EdgeWeight(target.Width - 1 - x, sideFeather));
                            float alpha = SmoothStep(Math.Min(horizontal, vertical));
                            int offset = x * 3;
                            for (int channel = 0; channel < 3; channel++)
                            {
                                float original = destinationRow[offset + channel];
                                float generated = faceRow[offset + channel];
                                destinationRow[offset + channel] = (byte)Math.Round(
                                    original + (generated - original) * alpha);
                            }
                        }
                    }
                }
                finally
                {
                    resizedFace.UnlockBits(faceData);
                    result.UnlockBits(resultData);
                }
            }
            return result;
        }

        private static float EdgeWeight(int distance, int feather)
        {
            return Math.Max(0f, Math.Min(1f, (float)distance / feather));
        }

        private static float SmoothStep(float value)
        {
            return value * value * (3f - 2f * value);
        }

        private static Bitmap CropBitmap(Bitmap source, Rect crop)
        {
            var result = new Bitmap(crop.Width, crop.Height);
            using (Graphics graphics = Graphics.FromImage(result))
                graphics.DrawImage(source, new Rectangle(0, 0, crop.Width, crop.Height),
                    new Rectangle(crop.X, crop.Y, crop.Width, crop.Height), GraphicsUnit.Pixel);
            return result;
        }

        private static Bitmap EnsureEvenSize(Bitmap source)
        {
            int width = source.Width - source.Width % 2;
            int height = source.Height - source.Height % 2;
            return ResizeBitmap(source, width, height);
        }

        private static Bitmap ResizeBitmap(Bitmap source, int width, int height)
        {
            var result = new Bitmap(width, height, PixelFormat.Format24bppRgb);
            using (Graphics graphics = Graphics.FromImage(result))
            {
                graphics.InterpolationMode = InterpolationMode.HighQualityBicubic;
                graphics.DrawImage(source, new Rectangle(0, 0, width, height));
            }
            return result;
        }

        private static Mat BitmapToMat(Bitmap bitmap)
        {
            using (var formatted = new Bitmap(bitmap.Width, bitmap.Height, PixelFormat.Format24bppRgb))
            {
                using (Graphics graphics = Graphics.FromImage(formatted)) graphics.DrawImageUnscaled(bitmap, 0, 0);
                Rectangle area = new Rectangle(0, 0, formatted.Width, formatted.Height);
                BitmapData data = formatted.LockBits(area, ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);
                try
                {
                    using (var view = new Mat(formatted.Height, formatted.Width, MatType.CV_8UC3,
                        data.Scan0, data.Stride))
                        return view.Clone();
                }
                finally
                {
                    formatted.UnlockBits(data);
                }
            }
        }

        private void pictureBox1_DoubleClick(object sender, EventArgs e) { ShowImage(pictureBox1.Image); }
        private void pictureBox2_DoubleClick(object sender, EventArgs e) { ShowImage(pictureBox2.Image); }

        private static void ShowImage(Image image)
        {
            if (image == null) return;
            using (var window = new Form())
            using (var view = new PictureBox())
            {
                window.Text = "图片预览";
                window.StartPosition = FormStartPosition.CenterParent;
                window.ClientSize = new System.Drawing.Size(Math.Min(1000, image.Width), Math.Min(800, image.Height));
                view.Dock = DockStyle.Fill;
                view.SizeMode = PictureBoxSizeMode.Zoom;
                view.Image = image;
                window.Controls.Add(view);
                window.ShowDialog();
                view.Image = null;
            }
        }

        private void button3_Click(object sender, EventArgs e)
        {
            if (pictureBox2.Image == null) return;
            using (var dialog = new SaveFileDialog())
            {
                dialog.Filter = "PNG 图片|*.png|JPEG 图片|*.jpg";
                dialog.DefaultExt = "png";
                if (dialog.ShowDialog() != DialogResult.OK) return;
                pictureBox2.Image.Save(dialog.FileName,
                    dialog.FilterIndex == 2 ? ImageFormat.Jpeg : ImageFormat.Png);
            }
        }

        protected override void OnFormClosed(FormClosedEventArgs e)
        {
            cancelRequested = true;
            if (gfpgan != null) gfpgan.Dispose();
            if (faceDetector != null) faceDetector.Dispose();
            if (wav2Lip != null) wav2Lip.Dispose();
            if (pictureBox1.Image != null) pictureBox1.Image.Dispose();
            if (pictureBox2.Image != null) pictureBox2.Image.Dispose();
            base.OnFormClosed(e);
        }

        private static void ReplacePicture(PictureBox box, Image image)
        {
            Image old = box.Image;
            box.Image = image;
            if (old != null) old.Dispose();
        }
    }

}

下载

源码下载