github地址: https://github.com/openai/whisper
环境准备:
pip install -U openai-whisper
on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
on Arch Linux
sudo pacman -S ffmpeg
on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
如果有GPU推荐额外安装(加速)根据自己 CUDA 版本换
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
whisper D:\video\test.mp4 --model medium --language Chinese --output_format txt
Python方式
import whisper
from pathlib import Path
def video_to_txt(video_path, txt_path=None, model_name="medium"):
# 加载模型
model = whisper.load_model(model_name)
# 自动识别视频音频
result = model.transcribe(
video_path,
language="zh", # 中文
fp16=False # CPU必须False
)
text = result["text"]
if txt_path is None:
txt_path = Path(video_path).with_suffix(".txt")
with open(txt_path, "w", encoding="utf-8") as f:
f.write(text)
print("完成 →", txt_path)
if __name__ == "__main__":
video_to_txt("test.mp4")
视频音质差建议
ffmpeg -i test.mp4 -ar 16000 -ac 1 test.wav
whisper test.wav --model medium --language zh --output_format txt