无声视频自动配音效，开源模型thinksound 和mmaudio复现

请站在我身后2025-07-19 23:44

朋友们，好久没看csdn 发现自己的文章还有人在看，所以还是来更新了

最近ASMR 的视频蛮火的，就是切开任何东西会发出声音，但我看教程都是走到app 的自动生成音效感觉无趣，还是自己复现

1、thinksound

最近一致排名挺高的一个项目，但是我个人感觉应该是训练的问题，效果真的不怎么好。
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editinghttps://thinksound-project.github.io/ 它利用思维链(CoT)推理，实现视频的逐步交互式音频生成和编辑。Cot 的应用哦

我先贴出来效果：我试了很多都不是很好，如果是我操作问题欢迎大家指正

5_thihksound

1、打开github

https://github.com/FunAudioLLM/ThinkSoundhttps://github.com/FunAudioLLM/ThinkSound2、到本地存放项目的文件夹下 shift 加右键打开cmd ，并git clone 仓库

bash 复制代码

git clone https://github.com/liuhuadai/ThinkSound.git

然后使用自己的编译器打开哦

3、我推荐window 用户直接双击setup_windows.bat 进行安装环境，前提是有conda ，我使用的windows ，安装流程没有问题，最后一步会从huggingface 下载模型，如果你存在网络问题，或者翻墙问题，可以去自行下载

bash 复制代码

git lfs install
git clone https://huggingface.co/liuhuadai/ThinkSound ckpts

4、激活环境后直接运行界面就可以，会进行一次下载，没有大问题

python app.py

我个人感觉这个不算新的项目反而表现的很好

5_mmaudio

1、来到github 同样进行clone

bash 复制代码

conda create -n mmaudio python==3.9
conda activate mmaudio

然后去torch 官网安装适合自己cuda 版本的torch ，要求2.5.1以上，比如下面

bash 复制代码

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --upgrade

最后去安装项目

bash 复制代码

cd MMAudio
pip install -e .

如果报错就是先进行pip 的更新pip install --upgrade pip

3、同样推荐直接运行界面

python gradio_demo.py

会进行模型下载，我没遇到什么问题，如果有问题留言，我偶尔回来看看，对了视频是wan2.1 本地生成的