WhisperLiveKit上手及主观评测

项目介绍

项目地址:https://github.com/QuentinFuxa/WhisperLiveKit

本文旨在快速上手,搭建环境,做下模型服务的功能学习和简单主观评测。

  • 关键词:转录transcription,说话人分离diarization,机器翻译translation,语音活动检测vad
  • 目的:环境搭建,快速上手,主观快速评测
  • 难度:中;

WhisperLiveKit是一个实施转录工具,那为什么不直接使用Whisper呢。

从应用场景上先看,会议/直播/在线教育等这些场景,需要实时输出转录结果(就需要对小窗口的录音进行转录),甚至要进行多人说话时,说话人识别,还可能需要实时翻译同传。

而,Whisper 是为完整语句设计的,而不是实时片段。处理小片段会丢失上下文,截断音节中的单词,并产生糟糕的转录。WhisperLiveKit 使用最先进的同步语音研究进行智能缓冲和增量处理。

WhisperLiveKit基于历史上的若干Research的基础上,进行开发设计,包括:

  • SimulStreaming (SOTA 2025) - Ultra-low latency transcription using AlignAtt policy
    SimulStreaming (SOTA 2025) - 使用 AlignAtt 策略实现超低延迟转录
  • NLLB, (distilled) (2024) - Translation to more than 100 languages.
    NLLB,(精简版) (2024) - 翻译超过 100 种语言。
  • WhisperStreaming (SOTA 2023) - Low latency transcription using LocalAgreement policy
    WhisperStreaming (SOTA 2023) - 使用 LocalAgreement 策略的低延迟转录
  • Streaming Sortformer (SOTA 2025) - Advanced real-time speaker diarization
    流式 Sortformer(SOTA 2025)- 高级实时说话人分割
  • Diart (SOTA 2021) - Real-time speaker diarization
    Diart(SOTA 2021)- 实时说话人分割
  • Silero VAD (2024) - Enterprise-grade Voice Activity Detection
    Silero VAD(2024)- 企业级语音活动检测

这是项目的架构图,支持多用户并发

下面我们进行安装部署,作下上手简单评测

安装运行

环境部署

使用conda 创建隔离运行环境。考虑到我这边是RTX5090显卡+匹配的torch版本,我这边还是基于之前的whipser环境进行复制,生成新的conda环境。

复制代码
(base) PS C:\Users\Jacob> conda env list

# conda environments:
#
base                 * C:\Users\Jacob\miniconda3
fireredtts             C:\Users\Jacob\miniconda3\envs\fireredtts
pytorch_nightly_env    C:\Users\Jacob\miniconda3\envs\pytorch_nightly_env
qwen_rtx5090           C:\Users\Jacob\miniconda3\envs\qwen_rtx5090
rtx50_comfyui          C:\Users\Jacob\miniconda3\envs\rtx50_comfyui
whisper                C:\Users\Jacob\miniconda3\envs\whisper

(base) PS C:\Users\Jacob> conda
usage: conda-script.py [-h] [-v] [--no-plugins] [-V] COMMAND ...

conda is a tool for managing and deploying applications, environments and packages.

options:
  -h, --help            Show this help message and exit.
  -v, --verbose         Can be used multiple times. Once for detailed output, twice for INFO logging, thrice for DEBUG
                        logging, four times for TRACE logging.
  --no-plugins          Disable all plugins that are not built into conda.
  -V, --version         Show the conda version number and exit.

commands:
  The following built-in and plugins subcommands are available.

  COMMAND
    activate            Activate a conda environment.
    clean               Remove unused packages and caches.
    commands            List all available conda subcommands (including those from plugins). Generally only used by
                        tab-completion.
    compare             Compare packages between conda environments.
    config              Modify configuration values in .condarc.
    content-trust       Signing and verification tools for Conda
    create              Create a new conda environment from a list of specified packages.
    deactivate          Deactivate the current active conda environment.
    doctor              Display a health report for your environment.
    env                 Create and manage conda environments.
    export              Export a given environment
    info                Display information about current conda install.
    init                Initialize conda for shell interaction.
    install             Install a list of packages into a specified conda environment.
    list                List installed packages in a conda environment.
    notices             Retrieve latest channel notifications.
    package             Create low-level conda packages. (EXPERIMENTAL)
    remove (uninstall)  Remove a list of packages from a specified conda environment.
    rename              Rename an existing environment.
    repoquery           Advanced search for repodata.
    run                 Run an executable in a conda environment.
    search              Search for packages and display associated information using the MatchSpec format.
    tos                 A subcommand for viewing, accepting, rejecting, and otherwise interacting with a channel's
                        Terms of Service (ToS). This plugin periodically checks for updated Terms of Service for the
                        active/selected channels. Channels with a Terms of Service will need to be accepted or
                        rejected prior to use. Conda will only allow package installation from channels without a
                        Terms of Service or with an accepted Terms of Service. Attempting to use a channel with a
                        rejected Terms of Service will result in an error.
    update (upgrade)    Update conda packages to the latest compatible version.
(base) PS C:\Users\Jacob> conda create -n whisperlivekit --clone whisper
3 channel Terms of Service accepted
Retrieving notices: done
Source:      C:\Users\Jacob\miniconda3\envs\whisper
Destination: C:\Users\Jacob\miniconda3\envs\whisperlivekit
Packages: 19
Files: 35845

Downloading and Extracting Packages:


## Package Plan ##

  environment location: C:\Users\Jacob\miniconda3\envs\whisperlivekit

  added / updated specs:
    - conda-forge/noarch::ca-certificates==2025.8.3=h4c7d964_0
    - conda-forge/win-64::ffmpeg==4.3.1=ha925a31_0
    - conda-forge/win-64::openssl==3.5.2=h725018a_0
    - defaults/noarch::pip==25.1=pyhc872135_2
    - defaults/noarch::tzdata==2025b=h04d1e81_0
    - defaults/win-64::bzip2==1.0.8=h2bbff1b_6
    - defaults/win-64::expat==2.7.1=h8ddb27b_0
    - defaults/win-64::libffi==3.4.4=hd77b12b_1
    - defaults/win-64::python==3.12.11=h716150d_0
    - defaults/win-64::setuptools==78.1.1=py312haa95532_0
    - defaults/win-64::sqlite==3.50.2=hda9a48d_1
    - defaults/win-64::tk==8.6.14=h5e9d12e_1
    - defaults/win-64::ucrt==10.0.22621.0=haa95532_0
    - defaults/win-64::vc14_runtime==14.44.35208=h4927774_10
    - defaults/win-64::vc==14.3=h2df5915_10
    - defaults/win-64::vs2015_runtime==14.44.35208=ha6b5a95_10
    - defaults/win-64::wheel==0.45.1=py312haa95532_0
    - defaults/win-64::xz==5.6.4=h4754444_1
    - defaults/win-64::zlib==1.2.13=h8cc25b3_1

done
#
# To activate this environment, use
#
#     $ conda activate whisperlivekit
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(base) PS C:\Users\Jacob> conda activate whisperlivekit
(whisperlivekit) PS C:\Users\Jacob> pip install whisperlivekit
Collecting whisperlivekit
  Downloading whisperlivekit-0.2.9-py3-none-any.whl.metadata (18 kB)
Collecting fastapi (from whisperlivekit)
  Downloading fastapi-0.117.1-py3-none-any.whl.metadata (28 kB)
Collecting librosa (from whisperlivekit)
  Downloading librosa-0.11.0-py3-none-any.whl.metadata (8.7 kB)
Requirement already satisfied: soundfile in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (0.13.1)
Collecting faster-whisper (from whisperlivekit)
  Downloading faster_whisper-1.2.0-py3-none-any.whl.metadata (16 kB)
Collecting uvicorn (from whisperlivekit)
  Downloading uvicorn-0.36.0-py3-none-any.whl.metadata (6.6 kB)
Collecting websockets (from whisperlivekit)
  Downloading websockets-15.0.1-cp312-cp312-win_amd64.whl.metadata (7.0 kB)
Requirement already satisfied: torchaudio>=2.0.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (2.8.0.dev20250810+cu128)
Requirement already satisfied: torch>=2.0.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (2.9.0.dev20250810+cu128)
Requirement already satisfied: tqdm in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (4.67.1)
Requirement already satisfied: tiktoken in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (0.11.0)
Requirement already satisfied: filelock in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.18.0)
Requirement already satisfied: typing-extensions>=4.10.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (4.14.1)
Requirement already satisfied: sympy>=1.13.3 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.5)
Requirement already satisfied: jinja2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.1.6)
Requirement already satisfied: fsspec>=0.8.5 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (2025.7.0)
Requirement already satisfied: setuptools in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (78.1.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from sympy>=1.13.3->torch>=2.0.0->whisperlivekit) (1.3.0)
Collecting starlette<0.49.0,>=0.40.0 (from fastapi->whisperlivekit)
  Downloading starlette-0.48.0-py3-none-any.whl.metadata (6.3 kB)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from fastapi->whisperlivekit) (2.11.7)
Requirement already satisfied: annotated-types>=0.6.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (2.33.2)
Requirement already satisfied: typing-inspection>=0.4.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (0.4.1)
Collecting anyio<5,>=3.6.2 (from starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit)
  Downloading anyio-4.10.0-py3-none-any.whl.metadata (4.0 kB)
Requirement already satisfied: idna>=2.8 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from anyio<5,>=3.6.2->starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit) (3.10)
Collecting sniffio>=1.1 (from anyio<5,>=3.6.2->starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit)
  Downloading sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting ctranslate2<5,>=4.0 (from faster-whisper->whisperlivekit)
  Downloading ctranslate2-4.6.0-cp312-cp312-win_amd64.whl.metadata (10 kB)
Requirement already satisfied: huggingface-hub>=0.13 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (0.34.4)
Requirement already satisfied: tokenizers<1,>=0.13 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (0.21.4)
Collecting onnxruntime<2,>=1.14 (from faster-whisper->whisperlivekit)
  Downloading onnxruntime-1.22.1-cp312-cp312-win_amd64.whl.metadata (5.1 kB)
Requirement already satisfied: av>=11 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (15.0.0)
Requirement already satisfied: numpy in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from ctranslate2<5,>=4.0->faster-whisper->whisperlivekit) (2.2.6)
Requirement already satisfied: pyyaml<7,>=5.3 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from ctranslate2<5,>=4.0->faster-whisper->whisperlivekit) (6.0.2)
Collecting coloredlogs (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting flatbuffers (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Downloading flatbuffers-25.2.10-py2.py3-none-any.whl.metadata (875 bytes)
Requirement already satisfied: packaging in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit) (25.0)
Collecting protobuf (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Downloading protobuf-6.32.1-cp310-abi3-win_amd64.whl.metadata (593 bytes)
Requirement already satisfied: requests in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2.32.4)
Requirement already satisfied: colorama in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from tqdm->whisperlivekit) (0.4.6)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Collecting pyreadline3 (from humanfriendly>=9.1->coloredlogs->onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Downloading pyreadline3-3.5.4-py3-none-any.whl.metadata (4.7 kB)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from jinja2->torch>=2.0.0->whisperlivekit) (3.0.2)
Collecting audioread>=2.1.9 (from librosa->whisperlivekit)
  Using cached audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Requirement already satisfied: numba>=0.51.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from librosa->whisperlivekit) (0.61.2)
Requirement already satisfied: scipy>=1.6.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from librosa->whisperlivekit) (1.16.1)
Collecting scikit-learn>=1.1.0 (from librosa->whisperlivekit)
  Using cached scikit_learn-1.7.2-cp312-cp312-win_amd64.whl.metadata (11 kB)
Collecting joblib>=1.0 (from librosa->whisperlivekit)
  Using cached joblib-1.5.2-py3-none-any.whl.metadata (5.6 kB)
Collecting decorator>=4.3.0 (from librosa->whisperlivekit)
  Using cached decorator-5.2.1-py3-none-any.whl.metadata (3.9 kB)
Collecting pooch>=1.1 (from librosa->whisperlivekit)
  Using cached pooch-1.8.2-py3-none-any.whl.metadata (10 kB)
Collecting soxr>=0.3.2 (from librosa->whisperlivekit)
  Using cached soxr-1.0.0-cp312-abi3-win_amd64.whl.metadata (5.6 kB)
Collecting lazy_loader>=0.1 (from librosa->whisperlivekit)
  Using cached lazy_loader-0.4-py3-none-any.whl.metadata (7.6 kB)
Collecting msgpack>=1.0 (from librosa->whisperlivekit)
  Using cached msgpack-1.1.1-cp312-cp312-win_amd64.whl.metadata (8.6 kB)
Requirement already satisfied: llvmlite<0.45,>=0.44.0dev0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from numba>=0.51.0->librosa->whisperlivekit) (0.44.0)
Collecting platformdirs>=2.5.0 (from pooch>=1.1->librosa->whisperlivekit)
  Using cached platformdirs-4.4.0-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: charset_normalizer<4,>=2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (3.4.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2025.8.3)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=1.1.0->librosa->whisperlivekit)
  Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: cffi>=1.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from soundfile->whisperlivekit) (1.17.1)
Requirement already satisfied: pycparser in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from cffi>=1.0->soundfile->whisperlivekit) (2.22)
Requirement already satisfied: regex>=2022.1.18 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from tiktoken->whisperlivekit) (2025.7.34)
Collecting click>=7.0 (from uvicorn->whisperlivekit)
  Downloading click-8.3.0-py3-none-any.whl.metadata (2.6 kB)
Collecting h11>=0.8 (from uvicorn->whisperlivekit)
  Downloading h11-0.16.0-py3-none-any.whl.metadata (8.3 kB)
Downloading whisperlivekit-0.2.9-py3-none-any.whl (876 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 876.5/876.5 kB 163.8 kB/s eta 0:00:00
Downloading fastapi-0.117.1-py3-none-any.whl (95 kB)
Downloading starlette-0.48.0-py3-none-any.whl (73 kB)
Downloading anyio-4.10.0-py3-none-any.whl (107 kB)
Downloading sniffio-1.3.1-py3-none-any.whl (10 kB)
Downloading faster_whisper-1.2.0-py3-none-any.whl (1.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 124.6 kB/s eta 0:00:00
Downloading ctranslate2-4.6.0-cp312-cp312-win_amd64.whl (19.5 MB)
   ━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.2/19.5 MB 124.3 kB/s eta 0:01:55
ERROR: Operation cancelled by user
(whisperlivekit) PS C:\Users\Jacob> pip install whisperlivekit
Collecting whisperlivekit
  Using cached whisperlivekit-0.2.9-py3-none-any.whl.metadata (18 kB)
Collecting fastapi (from whisperlivekit)
  Using cached fastapi-0.117.1-py3-none-any.whl.metadata (28 kB)
Collecting librosa (from whisperlivekit)
  Using cached librosa-0.11.0-py3-none-any.whl.metadata (8.7 kB)
Requirement already satisfied: soundfile in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (0.13.1)
Collecting faster-whisper (from whisperlivekit)
  Using cached faster_whisper-1.2.0-py3-none-any.whl.metadata (16 kB)
Collecting uvicorn (from whisperlivekit)
  Using cached uvicorn-0.36.0-py3-none-any.whl.metadata (6.6 kB)
Collecting websockets (from whisperlivekit)
  Using cached websockets-15.0.1-cp312-cp312-win_amd64.whl.metadata (7.0 kB)
Requirement already satisfied: torchaudio>=2.0.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (2.8.0.dev20250810+cu128)
Requirement already satisfied: torch>=2.0.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (2.9.0.dev20250810+cu128)
Requirement already satisfied: tqdm in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (4.67.1)
Requirement already satisfied: tiktoken in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (0.11.0)
Requirement already satisfied: filelock in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.18.0)
Requirement already satisfied: typing-extensions>=4.10.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (4.14.1)
Requirement already satisfied: sympy>=1.13.3 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.5)
Requirement already satisfied: jinja2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.1.6)
Requirement already satisfied: fsspec>=0.8.5 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (2025.7.0)
Requirement already satisfied: setuptools in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (78.1.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from sympy>=1.13.3->torch>=2.0.0->whisperlivekit) (1.3.0)
Collecting starlette<0.49.0,>=0.40.0 (from fastapi->whisperlivekit)
  Using cached starlette-0.48.0-py3-none-any.whl.metadata (6.3 kB)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from fastapi->whisperlivekit) (2.11.7)
Requirement already satisfied: annotated-types>=0.6.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (2.33.2)
Requirement already satisfied: typing-inspection>=0.4.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (0.4.1)
Collecting anyio<5,>=3.6.2 (from starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit)
  Using cached anyio-4.10.0-py3-none-any.whl.metadata (4.0 kB)
Requirement already satisfied: idna>=2.8 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from anyio<5,>=3.6.2->starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit) (3.10)
Collecting sniffio>=1.1 (from anyio<5,>=3.6.2->starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit)
  Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting ctranslate2<5,>=4.0 (from faster-whisper->whisperlivekit)
  Using cached ctranslate2-4.6.0-cp312-cp312-win_amd64.whl.metadata (10 kB)
Requirement already satisfied: huggingface-hub>=0.13 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (0.34.4)
Requirement already satisfied: tokenizers<1,>=0.13 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (0.21.4)
Collecting onnxruntime<2,>=1.14 (from faster-whisper->whisperlivekit)
  Using cached onnxruntime-1.22.1-cp312-cp312-win_amd64.whl.metadata (5.1 kB)
Requirement already satisfied: av>=11 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (15.0.0)
Requirement already satisfied: numpy in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from ctranslate2<5,>=4.0->faster-whisper->whisperlivekit) (2.2.6)
Requirement already satisfied: pyyaml<7,>=5.3 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from ctranslate2<5,>=4.0->faster-whisper->whisperlivekit) (6.0.2)
Collecting coloredlogs (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Using cached coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting flatbuffers (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Using cached flatbuffers-25.2.10-py2.py3-none-any.whl.metadata (875 bytes)
Requirement already satisfied: packaging in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit) (25.0)
Collecting protobuf (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Using cached protobuf-6.32.1-cp310-abi3-win_amd64.whl.metadata (593 bytes)
Requirement already satisfied: requests in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2.32.4)
Requirement already satisfied: colorama in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from tqdm->whisperlivekit) (0.4.6)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Using cached humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Collecting pyreadline3 (from humanfriendly>=9.1->coloredlogs->onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Using cached pyreadline3-3.5.4-py3-none-any.whl.metadata (4.7 kB)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from jinja2->torch>=2.0.0->whisperlivekit) (3.0.2)
Collecting audioread>=2.1.9 (from librosa->whisperlivekit)
  Using cached audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Requirement already satisfied: numba>=0.51.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from librosa->whisperlivekit) (0.61.2)
Requirement already satisfied: scipy>=1.6.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from librosa->whisperlivekit) (1.16.1)
Collecting scikit-learn>=1.1.0 (from librosa->whisperlivekit)
  Using cached scikit_learn-1.7.2-cp312-cp312-win_amd64.whl.metadata (11 kB)
Collecting joblib>=1.0 (from librosa->whisperlivekit)
  Using cached joblib-1.5.2-py3-none-any.whl.metadata (5.6 kB)
Collecting decorator>=4.3.0 (from librosa->whisperlivekit)
  Using cached decorator-5.2.1-py3-none-any.whl.metadata (3.9 kB)
Collecting pooch>=1.1 (from librosa->whisperlivekit)
  Using cached pooch-1.8.2-py3-none-any.whl.metadata (10 kB)
Collecting soxr>=0.3.2 (from librosa->whisperlivekit)
  Using cached soxr-1.0.0-cp312-abi3-win_amd64.whl.metadata (5.6 kB)
Collecting lazy_loader>=0.1 (from librosa->whisperlivekit)
  Using cached lazy_loader-0.4-py3-none-any.whl.metadata (7.6 kB)
Collecting msgpack>=1.0 (from librosa->whisperlivekit)
  Using cached msgpack-1.1.1-cp312-cp312-win_amd64.whl.metadata (8.6 kB)
Requirement already satisfied: llvmlite<0.45,>=0.44.0dev0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from numba>=0.51.0->librosa->whisperlivekit) (0.44.0)
Collecting platformdirs>=2.5.0 (from pooch>=1.1->librosa->whisperlivekit)
  Using cached platformdirs-4.4.0-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: charset_normalizer<4,>=2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (3.4.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2025.8.3)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=1.1.0->librosa->whisperlivekit)
  Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: cffi>=1.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from soundfile->whisperlivekit) (1.17.1)
Requirement already satisfied: pycparser in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from cffi>=1.0->soundfile->whisperlivekit) (2.22)
Requirement already satisfied: regex>=2022.1.18 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from tiktoken->whisperlivekit) (2025.7.34)
Collecting click>=7.0 (from uvicorn->whisperlivekit)
  Using cached click-8.3.0-py3-none-any.whl.metadata (2.6 kB)
Collecting h11>=0.8 (from uvicorn->whisperlivekit)
  Using cached h11-0.16.0-py3-none-any.whl.metadata (8.3 kB)
Using cached whisperlivekit-0.2.9-py3-none-any.whl (876 kB)
Using cached fastapi-0.117.1-py3-none-any.whl (95 kB)
Using cached starlette-0.48.0-py3-none-any.whl (73 kB)
Using cached anyio-4.10.0-py3-none-any.whl (107 kB)
Using cached sniffio-1.3.1-py3-none-any.whl (10 kB)
Using cached faster_whisper-1.2.0-py3-none-any.whl (1.1 MB)
Downloading ctranslate2-4.6.0-cp312-cp312-win_amd64.whl (19.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.5/19.5 MB 8.4 MB/s eta 0:00:00
Downloading onnxruntime-1.22.1-cp312-cp312-win_amd64.whl (12.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.7/12.7 MB 12.0 MB/s eta 0:00:00
Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Downloading flatbuffers-25.2.10-py2.py3-none-any.whl (30 kB)
Downloading librosa-0.11.0-py3-none-any.whl (260 kB)
Using cached audioread-3.0.1-py3-none-any.whl (23 kB)
Using cached decorator-5.2.1-py3-none-any.whl (9.2 kB)
Using cached joblib-1.5.2-py3-none-any.whl (308 kB)
Using cached lazy_loader-0.4-py3-none-any.whl (12 kB)
Using cached msgpack-1.1.1-cp312-cp312-win_amd64.whl (72 kB)
Using cached pooch-1.8.2-py3-none-any.whl (64 kB)
Using cached platformdirs-4.4.0-py3-none-any.whl (18 kB)
Using cached scikit_learn-1.7.2-cp312-cp312-win_amd64.whl (8.7 MB)
Using cached soxr-1.0.0-cp312-abi3-win_amd64.whl (172 kB)
Using cached threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Downloading protobuf-6.32.1-cp310-abi3-win_amd64.whl (435 kB)
Downloading pyreadline3-3.5.4-py3-none-any.whl (83 kB)
Downloading uvicorn-0.36.0-py3-none-any.whl (67 kB)
Downloading click-8.3.0-py3-none-any.whl (107 kB)
Downloading h11-0.16.0-py3-none-any.whl (37 kB)
Downloading websockets-15.0.1-cp312-cp312-win_amd64.whl (176 kB)
Installing collected packages: flatbuffers, websockets, threadpoolctl, soxr, sniffio, pyreadline3, protobuf, platformdirs, msgpack, lazy_loader, joblib, h11, decorator, ctranslate2, click, audioread, uvicorn, scikit-learn, pooch, humanfriendly, anyio, starlette, librosa, coloredlogs, onnxruntime, fastapi, faster-whisper, whisperlivekit
Successfully installed anyio-4.10.0 audioread-3.0.1 click-8.3.0 coloredlogs-15.0.1 ctranslate2-4.6.0 decorator-5.2.1 fastapi-0.117.1 faster-whisper-1.2.0 flatbuffers-25.2.10 h11-0.16.0 humanfriendly-10.0 joblib-1.5.2 lazy_loader-0.4 librosa-0.11.0 msgpack-1.1.1 onnxruntime-1.22.1 platformdirs-4.4.0 pooch-1.8.2 protobuf-6.32.1 pyreadline3-3.5.4 scikit-learn-1.7.2 sniffio-1.3.1 soxr-1.0.0 starlette-0.48.0 threadpoolctl-3.6.0 uvicorn-0.36.0 websockets-15.0.1 whisperlivekit-0.2.9

全部安装完成,没有报错。

运行

开始运行,根据你使用的模型大小(下面的例子都是base模型),下面日志,有2次运行命令,

第一次输入错误的语言参数代号,中文我写了ch,应该为zh,我们先用英文测试,参数代号en

复制代码
(whisperlivekit) PS C:\Users\Jacob> whisperlivekit-server --model base --language ch
INFO:     Started server process [7696]
INFO:     Waiting for application startup.
WARNING:whisperlivekit.basic_server:
==================================================
WhisperLiveKit 0.2.8 has introduced a new fast encoder feature using MLX Whisper or Faster Whisper for improved speed. Use --disable-fast-encoder to disable if you encounter issues.
==================================================

C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\torch\hub.py:330: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour
  warnings.warn(
Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to C:\Users\Jacob/.cache\torch\hub\master.zip
WARNING:whisperlivekit.simul_whisper.backend:
SimulStreaming backend is dual-licensed:
• Non-Commercial Use: PolyForm Noncommercial License 1.0.0.
• Commercial Use: Check SimulStreaming README (github.com/ufal/SimulStreaming) for more details.

Simulstreaming will use Faster Whisper for the encoder.
config.json: 2.31kB [00:00, 11.4kB/s]
C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\huggingface_hub\file_download.py:143: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\Jacob\.cache\huggingface\hub\models--Systran--faster-whisper-base. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
vocabulary.txt: 460kB [00:00, 637kB/s]
tokenizer.json: 0.00B [00:00, ?B/s]   Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
WARNING:huggingface_hub.file_download:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
tokenizer.json: 2.20MB [00:01, 1.75MB/s]
model.bin: 100%|████████████████████████████████████████████████████████████████████| 145M/145M [00:11<00:00, 12.9MB/s]
100%|███████████████████████████████████████| 139M/139M [00:10<00:00, 13.9MiB/s]
Downloading warmup file from https://github.com/ggerganov/whisper.cpp/raw/master/samples/jfk.wav
ERROR:    Traceback (most recent call last):
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\starlette\routing.py", line 694, in lifespan
    async with self.lifespan_context(app) as maybe_state:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\basic_server.py", line 32, in lifespan
    transcription_engine = TranscriptionEngine(
                           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\core.py", line 113, in __init__
    self.asr = SimulStreamingASR(
               ^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\backend.py", line 312, in __init__
    self.models = [self.load_model() for i in range(self.preload_model_count)]
                   ^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\backend.py", line 321, in load_model
    temp_model = PaddedAlignAttWhisper(
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\simul_whisper.py", line 80, in __init__
    self.create_tokenizer(cfg.language if cfg.language != "auto" else None)
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\simul_whisper.py", line 190, in create_tokenizer
    self.tokenizer = tokenizer.get_tokenizer(
                     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\whisper\tokenizer.py", line 380, in get_tokenizer
    raise ValueError(f"Unsupported language: {language}")
ValueError: Unsupported language: ch

ERROR:    Application startup failed. Exiting.(whisperlivekit) PS C:\Users\Jacob> whisperlivekit-server --model base --language en
INFO:     Started server process [2764]
INFO:     Waiting for application startup.
WARNING:whisperlivekit.basic_server:
==================================================
WhisperLiveKit 0.2.8 has introduced a new fast encoder feature using MLX Whisper or Faster Whisper for improved speed. Use --disable-fast-encoder to disable if you encounter issues.
==================================================

Using cache found in C:\Users\Jacob/.cache\torch\hub\snakers4_silero-vad_master
WARNING:whisperlivekit.simul_whisper.backend:
SimulStreaming backend is dual-licensed:
• Non-Commercial Use: PolyForm Noncommercial License 1.0.0.
• Commercial Use: Check SimulStreaming README (github.com/ufal/SimulStreaming) for more details.

Simulstreaming will use Faster Whisper for the encoder.
C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\whisper\timing.py:42: UserWarning: Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower median kernel implementation...
  warnings.warn(
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
INFO:     ::1:50387 - "GET / HTTP/1.1" 200 OK
INFO:     ::1:50387 - "GET /favicon.ico HTTP/1.1" 404 Not Found
INFO:     ::1:51078 - "GET / HTTP/1.1" 200 OK
INFO:     ::1:51084 - "WebSocket /asr" [accepted]
INFO:whisperlivekit.basic_server:WebSocket connection opened.
INFO:     connection open
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.60s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.55s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.49s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.55s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.49s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.55s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=2.17s | + Silence of = 2.16s | last_end = 2.12 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=2.17s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.01s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.01s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=3.79s | + Silence of = 4.32s | last_end = 6.64 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=3.79s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.49s | + Silence of = 1.56s | last_end = 12.88 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.49s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=1.09s | + Silence of = 2.64s | last_end = 12.88 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=1.09s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s | + Silence of = 1.56s | last_end = 20.020000000000003 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s | + Silence of = 1.08s | last_end = 24.1 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |

首次运行会下载缓存文件,查看log日志

Based on the log, the program first downloaded the following components during its initial run:

  • Silero-VAD: A voice activity detection model from a GitHub repository.
  • Faster-Whisper-Base Model: A more efficient version of the Whisper model for transcription, including its associated files:
    • config.json
    • vocabulary.txt
    • tokenizer.json
    • model.bin
  • Warmup file: A sample audio file (jfk.wav) from the whisper.cpp repository, used to test and initialize the model.

下载的文件主要保存在两个地方:

  • Hugging Face 缓存目录

大部分模型文件,比如 Faster-Whisper-Baseconfig.jsonvocabulary.txttokenizer.jsonmodel.bin 都被下载到了 Hugging Face 的默认缓存目录。

  • 路径: C:\Users\Jacob\.cache\huggingface\hub

这个缓存目录通常用于存储从 Hugging Face Hub 下载的模型和数据集。

  • Torch Hub 缓存目录

Silero-VAD 模型是从 GitHub 下载的,它被保存在 Torch Hub 的缓存目录中。

  • 路径: C:\Users\Jacob\.cache\torch\hub\master.zip

启动:whisperlivekit-server --model base --language en,使用标准模型,识别en英语;

打开网站:http://localhost:8000/

可以看到,它使用websocket和后台backend进行通信。是一个比较简单simple的一个前端页面。说话人有解析,它会把结果分成段。没有识别说话人(默认参数)

为了进一步测试,我们需要对模型服务进行配置。

模型配置

我们需要了解命令的每一个参数,可以先看https://github.com/QuentinFuxa/WhisperLiveKit的介绍。

其中,关于模型大小的配置,提供了很多模型,中低端的模型只有en语种

https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/available_models.md

其中,关于语种的参数如下

https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/simul_whisper/whisper/tokenizer.py

总结

  • 项目成熟度不高,不过属于快速开发过程中,可以看release日志https://github.com/QuentinFuxa/WhisperLiveKit/releases
  • 前端界面比较简洁,近作示例,可以结合业务场景,自行开发前端,github中有前段示例
  • 说话人识别默认不生效,需要配置打开(whisperlivekit-server --model base --language en --diarization),试了下英文男女博客还是比较容易区分的,试了中文的2个女记者,基本无法识别说话人了。
  • 实时转录延迟大约1s左右
  • 准确率看了下,个别常见词也会出错(相近的发音单词)
  • 中文识别会默认输出繁体字,识别语言不配置,会自动识别出英文中文,并自动转录;
  • 资源占用:RTX5090 GPU占用在10%左右;GPU内存占用20GB;

第一次运行base模型的运行跑了10分钟以上,比较稳定

第二/三次运行,GPU内存9GB到20GB,再到30GB左右,跑了一分钟或者几分钟左右,GPU到了100%,开始大面积丢识别结果,功能异常。

相关推荐
biubiubiu07062 小时前
faster-whisper + FastAPI安装
whisper
京东零售技术2 小时前
用AI重塑电商,京东零售发布电商创新AI架构体系Oxygen
大数据·人工智能
love530love2 小时前
Windows 系统部署 阿里团队开源的先进大规模视频生成模型 Wan2.2 教程——基于 EPGF 架构
运维·人工智能·windows·python·架构·开源·大模型
档案宝档案管理2 小时前
零售企业档案检索慢?档案管理系统解决档案管理痛点
大数据·人工智能·档案·档案管理
说私域2 小时前
定制开发开源AI智能名片S2B2C商城小程序在智慧零售价值链重构中的价值研究
人工智能·小程序·开源
41号学员3 小时前
构建神经网络的两大核心工具
人工智能·pytorch·深度学习
无风听海3 小时前
神经网络之仿射变换
人工智能·深度学习·神经网络
37手游后端团队3 小时前
如何利用cursor高效重构代码
人工智能·后端