OpenAI Whisper 把mp3语音转文字

susu10830189112026-02-11 14:04

github地址： https://github.com/openai/whisper

环境准备：

pip install -U openai-whisper

on Ubuntu or Debian

sudo apt update && sudo apt install ffmpeg

on Arch Linux

sudo pacman -S ffmpeg

on MacOS using Homebrew (https://brew.sh/)

brew install ffmpeg

on Windows using Chocolatey (https://chocolatey.org/)

choco install ffmpeg

on Windows using Scoop (https://scoop.sh/)

scoop install ffmpeg

如果有GPU推荐额外安装（加速）根据自己 CUDA 版本换

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
whisper D:\video\test.mp4 --model medium --language Chinese --output_format txt

Python方式

复制代码

import whisper
from pathlib import Path


def video_to_txt(video_path, txt_path=None, model_name="medium"):
    # 加载模型
    model = whisper.load_model(model_name)

    # 自动识别视频音频
    result = model.transcribe(
        video_path,
        language="zh",  # 中文
        fp16=False      # CPU必须False
    )

    text = result["text"]

    if txt_path is None:
        txt_path = Path(video_path).with_suffix(".txt")

    with open(txt_path, "w", encoding="utf-8") as f:
        f.write(text)

    print("完成 →", txt_path)


if __name__ == "__main__":
    video_to_txt("test.mp4")

视频音质差建议

ffmpeg -i test.mp4 -ar 16000 -ac 1 test.wav

whisper test.wav --model medium --language zh --output_format txt