WhisperLiveKit上手及主观评测

项目介绍

项目地址:https://github.com/QuentinFuxa/WhisperLiveKit

本文旨在快速上手,搭建环境,做下模型服务的功能学习和简单主观评测。

  • 关键词:转录transcription,说话人分离diarization,机器翻译translation,语音活动检测vad
  • 目的:环境搭建,快速上手,主观快速评测
  • 难度:中;

WhisperLiveKit是一个实施转录工具,那为什么不直接使用Whisper呢。

从应用场景上先看,会议/直播/在线教育等这些场景,需要实时输出转录结果(就需要对小窗口的录音进行转录),甚至要进行多人说话时,说话人识别,还可能需要实时翻译同传。

而,Whisper 是为完整语句设计的,而不是实时片段。处理小片段会丢失上下文,截断音节中的单词,并产生糟糕的转录。WhisperLiveKit 使用最先进的同步语音研究进行智能缓冲和增量处理。

WhisperLiveKit基于历史上的若干Research的基础上,进行开发设计,包括:

  • SimulStreaming (SOTA 2025) - Ultra-low latency transcription using AlignAtt policy
    SimulStreaming (SOTA 2025) - 使用 AlignAtt 策略实现超低延迟转录
  • NLLB, (distilled) (2024) - Translation to more than 100 languages.
    NLLB,(精简版) (2024) - 翻译超过 100 种语言。
  • WhisperStreaming (SOTA 2023) - Low latency transcription using LocalAgreement policy
    WhisperStreaming (SOTA 2023) - 使用 LocalAgreement 策略的低延迟转录
  • Streaming Sortformer (SOTA 2025) - Advanced real-time speaker diarization
    流式 Sortformer(SOTA 2025)- 高级实时说话人分割
  • Diart (SOTA 2021) - Real-time speaker diarization
    Diart(SOTA 2021)- 实时说话人分割
  • Silero VAD (2024) - Enterprise-grade Voice Activity Detection
    Silero VAD(2024)- 企业级语音活动检测

这是项目的架构图,支持多用户并发

下面我们进行安装部署,作下上手简单评测

安装运行

环境部署

使用conda 创建隔离运行环境。考虑到我这边是RTX5090显卡+匹配的torch版本,我这边还是基于之前的whipser环境进行复制,生成新的conda环境。

复制代码
(base) PS C:\Users\Jacob> conda env list

# conda environments:
#
base                 * C:\Users\Jacob\miniconda3
fireredtts             C:\Users\Jacob\miniconda3\envs\fireredtts
pytorch_nightly_env    C:\Users\Jacob\miniconda3\envs\pytorch_nightly_env
qwen_rtx5090           C:\Users\Jacob\miniconda3\envs\qwen_rtx5090
rtx50_comfyui          C:\Users\Jacob\miniconda3\envs\rtx50_comfyui
whisper                C:\Users\Jacob\miniconda3\envs\whisper

(base) PS C:\Users\Jacob> conda
usage: conda-script.py [-h] [-v] [--no-plugins] [-V] COMMAND ...

conda is a tool for managing and deploying applications, environments and packages.

options:
  -h, --help            Show this help message and exit.
  -v, --verbose         Can be used multiple times. Once for detailed output, twice for INFO logging, thrice for DEBUG
                        logging, four times for TRACE logging.
  --no-plugins          Disable all plugins that are not built into conda.
  -V, --version         Show the conda version number and exit.

commands:
  The following built-in and plugins subcommands are available.

  COMMAND
    activate            Activate a conda environment.
    clean               Remove unused packages and caches.
    commands            List all available conda subcommands (including those from plugins). Generally only used by
                        tab-completion.
    compare             Compare packages between conda environments.
    config              Modify configuration values in .condarc.
    content-trust       Signing and verification tools for Conda
    create              Create a new conda environment from a list of specified packages.
    deactivate          Deactivate the current active conda environment.
    doctor              Display a health report for your environment.
    env                 Create and manage conda environments.
    export              Export a given environment
    info                Display information about current conda install.
    init                Initialize conda for shell interaction.
    install             Install a list of packages into a specified conda environment.
    list                List installed packages in a conda environment.
    notices             Retrieve latest channel notifications.
    package             Create low-level conda packages. (EXPERIMENTAL)
    remove (uninstall)  Remove a list of packages from a specified conda environment.
    rename              Rename an existing environment.
    repoquery           Advanced search for repodata.
    run                 Run an executable in a conda environment.
    search              Search for packages and display associated information using the MatchSpec format.
    tos                 A subcommand for viewing, accepting, rejecting, and otherwise interacting with a channel's
                        Terms of Service (ToS). This plugin periodically checks for updated Terms of Service for the
                        active/selected channels. Channels with a Terms of Service will need to be accepted or
                        rejected prior to use. Conda will only allow package installation from channels without a
                        Terms of Service or with an accepted Terms of Service. Attempting to use a channel with a
                        rejected Terms of Service will result in an error.
    update (upgrade)    Update conda packages to the latest compatible version.
(base) PS C:\Users\Jacob> conda create -n whisperlivekit --clone whisper
3 channel Terms of Service accepted
Retrieving notices: done
Source:      C:\Users\Jacob\miniconda3\envs\whisper
Destination: C:\Users\Jacob\miniconda3\envs\whisperlivekit
Packages: 19
Files: 35845

Downloading and Extracting Packages:


## Package Plan ##

  environment location: C:\Users\Jacob\miniconda3\envs\whisperlivekit

  added / updated specs:
    - conda-forge/noarch::ca-certificates==2025.8.3=h4c7d964_0
    - conda-forge/win-64::ffmpeg==4.3.1=ha925a31_0
    - conda-forge/win-64::openssl==3.5.2=h725018a_0
    - defaults/noarch::pip==25.1=pyhc872135_2
    - defaults/noarch::tzdata==2025b=h04d1e81_0
    - defaults/win-64::bzip2==1.0.8=h2bbff1b_6
    - defaults/win-64::expat==2.7.1=h8ddb27b_0
    - defaults/win-64::libffi==3.4.4=hd77b12b_1
    - defaults/win-64::python==3.12.11=h716150d_0
    - defaults/win-64::setuptools==78.1.1=py312haa95532_0
    - defaults/win-64::sqlite==3.50.2=hda9a48d_1
    - defaults/win-64::tk==8.6.14=h5e9d12e_1
    - defaults/win-64::ucrt==10.0.22621.0=haa95532_0
    - defaults/win-64::vc14_runtime==14.44.35208=h4927774_10
    - defaults/win-64::vc==14.3=h2df5915_10
    - defaults/win-64::vs2015_runtime==14.44.35208=ha6b5a95_10
    - defaults/win-64::wheel==0.45.1=py312haa95532_0
    - defaults/win-64::xz==5.6.4=h4754444_1
    - defaults/win-64::zlib==1.2.13=h8cc25b3_1

done
#
# To activate this environment, use
#
#     $ conda activate whisperlivekit
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(base) PS C:\Users\Jacob> conda activate whisperlivekit
(whisperlivekit) PS C:\Users\Jacob> pip install whisperlivekit
Collecting whisperlivekit
  Downloading whisperlivekit-0.2.9-py3-none-any.whl.metadata (18 kB)
Collecting fastapi (from whisperlivekit)
  Downloading fastapi-0.117.1-py3-none-any.whl.metadata (28 kB)
Collecting librosa (from whisperlivekit)
  Downloading librosa-0.11.0-py3-none-any.whl.metadata (8.7 kB)
Requirement already satisfied: soundfile in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (0.13.1)
Collecting faster-whisper (from whisperlivekit)
  Downloading faster_whisper-1.2.0-py3-none-any.whl.metadata (16 kB)
Collecting uvicorn (from whisperlivekit)
  Downloading uvicorn-0.36.0-py3-none-any.whl.metadata (6.6 kB)
Collecting websockets (from whisperlivekit)
  Downloading websockets-15.0.1-cp312-cp312-win_amd64.whl.metadata (7.0 kB)
Requirement already satisfied: torchaudio>=2.0.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (2.8.0.dev20250810+cu128)
Requirement already satisfied: torch>=2.0.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (2.9.0.dev20250810+cu128)
Requirement already satisfied: tqdm in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (4.67.1)
Requirement already satisfied: tiktoken in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (0.11.0)
Requirement already satisfied: filelock in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.18.0)
Requirement already satisfied: typing-extensions>=4.10.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (4.14.1)
Requirement already satisfied: sympy>=1.13.3 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.5)
Requirement already satisfied: jinja2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.1.6)
Requirement already satisfied: fsspec>=0.8.5 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (2025.7.0)
Requirement already satisfied: setuptools in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (78.1.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from sympy>=1.13.3->torch>=2.0.0->whisperlivekit) (1.3.0)
Collecting starlette<0.49.0,>=0.40.0 (from fastapi->whisperlivekit)
  Downloading starlette-0.48.0-py3-none-any.whl.metadata (6.3 kB)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from fastapi->whisperlivekit) (2.11.7)
Requirement already satisfied: annotated-types>=0.6.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (2.33.2)
Requirement already satisfied: typing-inspection>=0.4.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (0.4.1)
Collecting anyio<5,>=3.6.2 (from starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit)
  Downloading anyio-4.10.0-py3-none-any.whl.metadata (4.0 kB)
Requirement already satisfied: idna>=2.8 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from anyio<5,>=3.6.2->starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit) (3.10)
Collecting sniffio>=1.1 (from anyio<5,>=3.6.2->starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit)
  Downloading sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting ctranslate2<5,>=4.0 (from faster-whisper->whisperlivekit)
  Downloading ctranslate2-4.6.0-cp312-cp312-win_amd64.whl.metadata (10 kB)
Requirement already satisfied: huggingface-hub>=0.13 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (0.34.4)
Requirement already satisfied: tokenizers<1,>=0.13 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (0.21.4)
Collecting onnxruntime<2,>=1.14 (from faster-whisper->whisperlivekit)
  Downloading onnxruntime-1.22.1-cp312-cp312-win_amd64.whl.metadata (5.1 kB)
Requirement already satisfied: av>=11 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (15.0.0)
Requirement already satisfied: numpy in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from ctranslate2<5,>=4.0->faster-whisper->whisperlivekit) (2.2.6)
Requirement already satisfied: pyyaml<7,>=5.3 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from ctranslate2<5,>=4.0->faster-whisper->whisperlivekit) (6.0.2)
Collecting coloredlogs (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting flatbuffers (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Downloading flatbuffers-25.2.10-py2.py3-none-any.whl.metadata (875 bytes)
Requirement already satisfied: packaging in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit) (25.0)
Collecting protobuf (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Downloading protobuf-6.32.1-cp310-abi3-win_amd64.whl.metadata (593 bytes)
Requirement already satisfied: requests in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2.32.4)
Requirement already satisfied: colorama in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from tqdm->whisperlivekit) (0.4.6)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Collecting pyreadline3 (from humanfriendly>=9.1->coloredlogs->onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Downloading pyreadline3-3.5.4-py3-none-any.whl.metadata (4.7 kB)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from jinja2->torch>=2.0.0->whisperlivekit) (3.0.2)
Collecting audioread>=2.1.9 (from librosa->whisperlivekit)
  Using cached audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Requirement already satisfied: numba>=0.51.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from librosa->whisperlivekit) (0.61.2)
Requirement already satisfied: scipy>=1.6.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from librosa->whisperlivekit) (1.16.1)
Collecting scikit-learn>=1.1.0 (from librosa->whisperlivekit)
  Using cached scikit_learn-1.7.2-cp312-cp312-win_amd64.whl.metadata (11 kB)
Collecting joblib>=1.0 (from librosa->whisperlivekit)
  Using cached joblib-1.5.2-py3-none-any.whl.metadata (5.6 kB)
Collecting decorator>=4.3.0 (from librosa->whisperlivekit)
  Using cached decorator-5.2.1-py3-none-any.whl.metadata (3.9 kB)
Collecting pooch>=1.1 (from librosa->whisperlivekit)
  Using cached pooch-1.8.2-py3-none-any.whl.metadata (10 kB)
Collecting soxr>=0.3.2 (from librosa->whisperlivekit)
  Using cached soxr-1.0.0-cp312-abi3-win_amd64.whl.metadata (5.6 kB)
Collecting lazy_loader>=0.1 (from librosa->whisperlivekit)
  Using cached lazy_loader-0.4-py3-none-any.whl.metadata (7.6 kB)
Collecting msgpack>=1.0 (from librosa->whisperlivekit)
  Using cached msgpack-1.1.1-cp312-cp312-win_amd64.whl.metadata (8.6 kB)
Requirement already satisfied: llvmlite<0.45,>=0.44.0dev0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from numba>=0.51.0->librosa->whisperlivekit) (0.44.0)
Collecting platformdirs>=2.5.0 (from pooch>=1.1->librosa->whisperlivekit)
  Using cached platformdirs-4.4.0-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: charset_normalizer<4,>=2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (3.4.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2025.8.3)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=1.1.0->librosa->whisperlivekit)
  Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: cffi>=1.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from soundfile->whisperlivekit) (1.17.1)
Requirement already satisfied: pycparser in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from cffi>=1.0->soundfile->whisperlivekit) (2.22)
Requirement already satisfied: regex>=2022.1.18 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from tiktoken->whisperlivekit) (2025.7.34)
Collecting click>=7.0 (from uvicorn->whisperlivekit)
  Downloading click-8.3.0-py3-none-any.whl.metadata (2.6 kB)
Collecting h11>=0.8 (from uvicorn->whisperlivekit)
  Downloading h11-0.16.0-py3-none-any.whl.metadata (8.3 kB)
Downloading whisperlivekit-0.2.9-py3-none-any.whl (876 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 876.5/876.5 kB 163.8 kB/s eta 0:00:00
Downloading fastapi-0.117.1-py3-none-any.whl (95 kB)
Downloading starlette-0.48.0-py3-none-any.whl (73 kB)
Downloading anyio-4.10.0-py3-none-any.whl (107 kB)
Downloading sniffio-1.3.1-py3-none-any.whl (10 kB)
Downloading faster_whisper-1.2.0-py3-none-any.whl (1.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 124.6 kB/s eta 0:00:00
Downloading ctranslate2-4.6.0-cp312-cp312-win_amd64.whl (19.5 MB)
   ━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.2/19.5 MB 124.3 kB/s eta 0:01:55
ERROR: Operation cancelled by user
(whisperlivekit) PS C:\Users\Jacob> pip install whisperlivekit
Collecting whisperlivekit
  Using cached whisperlivekit-0.2.9-py3-none-any.whl.metadata (18 kB)
Collecting fastapi (from whisperlivekit)
  Using cached fastapi-0.117.1-py3-none-any.whl.metadata (28 kB)
Collecting librosa (from whisperlivekit)
  Using cached librosa-0.11.0-py3-none-any.whl.metadata (8.7 kB)
Requirement already satisfied: soundfile in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (0.13.1)
Collecting faster-whisper (from whisperlivekit)
  Using cached faster_whisper-1.2.0-py3-none-any.whl.metadata (16 kB)
Collecting uvicorn (from whisperlivekit)
  Using cached uvicorn-0.36.0-py3-none-any.whl.metadata (6.6 kB)
Collecting websockets (from whisperlivekit)
  Using cached websockets-15.0.1-cp312-cp312-win_amd64.whl.metadata (7.0 kB)
Requirement already satisfied: torchaudio>=2.0.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (2.8.0.dev20250810+cu128)
Requirement already satisfied: torch>=2.0.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (2.9.0.dev20250810+cu128)
Requirement already satisfied: tqdm in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (4.67.1)
Requirement already satisfied: tiktoken in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (0.11.0)
Requirement already satisfied: filelock in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.18.0)
Requirement already satisfied: typing-extensions>=4.10.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (4.14.1)
Requirement already satisfied: sympy>=1.13.3 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.5)
Requirement already satisfied: jinja2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.1.6)
Requirement already satisfied: fsspec>=0.8.5 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (2025.7.0)
Requirement already satisfied: setuptools in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (78.1.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from sympy>=1.13.3->torch>=2.0.0->whisperlivekit) (1.3.0)
Collecting starlette<0.49.0,>=0.40.0 (from fastapi->whisperlivekit)
  Using cached starlette-0.48.0-py3-none-any.whl.metadata (6.3 kB)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from fastapi->whisperlivekit) (2.11.7)
Requirement already satisfied: annotated-types>=0.6.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (2.33.2)
Requirement already satisfied: typing-inspection>=0.4.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (0.4.1)
Collecting anyio<5,>=3.6.2 (from starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit)
  Using cached anyio-4.10.0-py3-none-any.whl.metadata (4.0 kB)
Requirement already satisfied: idna>=2.8 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from anyio<5,>=3.6.2->starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit) (3.10)
Collecting sniffio>=1.1 (from anyio<5,>=3.6.2->starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit)
  Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting ctranslate2<5,>=4.0 (from faster-whisper->whisperlivekit)
  Using cached ctranslate2-4.6.0-cp312-cp312-win_amd64.whl.metadata (10 kB)
Requirement already satisfied: huggingface-hub>=0.13 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (0.34.4)
Requirement already satisfied: tokenizers<1,>=0.13 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (0.21.4)
Collecting onnxruntime<2,>=1.14 (from faster-whisper->whisperlivekit)
  Using cached onnxruntime-1.22.1-cp312-cp312-win_amd64.whl.metadata (5.1 kB)
Requirement already satisfied: av>=11 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (15.0.0)
Requirement already satisfied: numpy in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from ctranslate2<5,>=4.0->faster-whisper->whisperlivekit) (2.2.6)
Requirement already satisfied: pyyaml<7,>=5.3 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from ctranslate2<5,>=4.0->faster-whisper->whisperlivekit) (6.0.2)
Collecting coloredlogs (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Using cached coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting flatbuffers (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Using cached flatbuffers-25.2.10-py2.py3-none-any.whl.metadata (875 bytes)
Requirement already satisfied: packaging in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit) (25.0)
Collecting protobuf (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Using cached protobuf-6.32.1-cp310-abi3-win_amd64.whl.metadata (593 bytes)
Requirement already satisfied: requests in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2.32.4)
Requirement already satisfied: colorama in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from tqdm->whisperlivekit) (0.4.6)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Using cached humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Collecting pyreadline3 (from humanfriendly>=9.1->coloredlogs->onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
  Using cached pyreadline3-3.5.4-py3-none-any.whl.metadata (4.7 kB)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from jinja2->torch>=2.0.0->whisperlivekit) (3.0.2)
Collecting audioread>=2.1.9 (from librosa->whisperlivekit)
  Using cached audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Requirement already satisfied: numba>=0.51.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from librosa->whisperlivekit) (0.61.2)
Requirement already satisfied: scipy>=1.6.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from librosa->whisperlivekit) (1.16.1)
Collecting scikit-learn>=1.1.0 (from librosa->whisperlivekit)
  Using cached scikit_learn-1.7.2-cp312-cp312-win_amd64.whl.metadata (11 kB)
Collecting joblib>=1.0 (from librosa->whisperlivekit)
  Using cached joblib-1.5.2-py3-none-any.whl.metadata (5.6 kB)
Collecting decorator>=4.3.0 (from librosa->whisperlivekit)
  Using cached decorator-5.2.1-py3-none-any.whl.metadata (3.9 kB)
Collecting pooch>=1.1 (from librosa->whisperlivekit)
  Using cached pooch-1.8.2-py3-none-any.whl.metadata (10 kB)
Collecting soxr>=0.3.2 (from librosa->whisperlivekit)
  Using cached soxr-1.0.0-cp312-abi3-win_amd64.whl.metadata (5.6 kB)
Collecting lazy_loader>=0.1 (from librosa->whisperlivekit)
  Using cached lazy_loader-0.4-py3-none-any.whl.metadata (7.6 kB)
Collecting msgpack>=1.0 (from librosa->whisperlivekit)
  Using cached msgpack-1.1.1-cp312-cp312-win_amd64.whl.metadata (8.6 kB)
Requirement already satisfied: llvmlite<0.45,>=0.44.0dev0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from numba>=0.51.0->librosa->whisperlivekit) (0.44.0)
Collecting platformdirs>=2.5.0 (from pooch>=1.1->librosa->whisperlivekit)
  Using cached platformdirs-4.4.0-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: charset_normalizer<4,>=2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (3.4.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2025.8.3)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=1.1.0->librosa->whisperlivekit)
  Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: cffi>=1.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from soundfile->whisperlivekit) (1.17.1)
Requirement already satisfied: pycparser in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from cffi>=1.0->soundfile->whisperlivekit) (2.22)
Requirement already satisfied: regex>=2022.1.18 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from tiktoken->whisperlivekit) (2025.7.34)
Collecting click>=7.0 (from uvicorn->whisperlivekit)
  Using cached click-8.3.0-py3-none-any.whl.metadata (2.6 kB)
Collecting h11>=0.8 (from uvicorn->whisperlivekit)
  Using cached h11-0.16.0-py3-none-any.whl.metadata (8.3 kB)
Using cached whisperlivekit-0.2.9-py3-none-any.whl (876 kB)
Using cached fastapi-0.117.1-py3-none-any.whl (95 kB)
Using cached starlette-0.48.0-py3-none-any.whl (73 kB)
Using cached anyio-4.10.0-py3-none-any.whl (107 kB)
Using cached sniffio-1.3.1-py3-none-any.whl (10 kB)
Using cached faster_whisper-1.2.0-py3-none-any.whl (1.1 MB)
Downloading ctranslate2-4.6.0-cp312-cp312-win_amd64.whl (19.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.5/19.5 MB 8.4 MB/s eta 0:00:00
Downloading onnxruntime-1.22.1-cp312-cp312-win_amd64.whl (12.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.7/12.7 MB 12.0 MB/s eta 0:00:00
Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Downloading flatbuffers-25.2.10-py2.py3-none-any.whl (30 kB)
Downloading librosa-0.11.0-py3-none-any.whl (260 kB)
Using cached audioread-3.0.1-py3-none-any.whl (23 kB)
Using cached decorator-5.2.1-py3-none-any.whl (9.2 kB)
Using cached joblib-1.5.2-py3-none-any.whl (308 kB)
Using cached lazy_loader-0.4-py3-none-any.whl (12 kB)
Using cached msgpack-1.1.1-cp312-cp312-win_amd64.whl (72 kB)
Using cached pooch-1.8.2-py3-none-any.whl (64 kB)
Using cached platformdirs-4.4.0-py3-none-any.whl (18 kB)
Using cached scikit_learn-1.7.2-cp312-cp312-win_amd64.whl (8.7 MB)
Using cached soxr-1.0.0-cp312-abi3-win_amd64.whl (172 kB)
Using cached threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Downloading protobuf-6.32.1-cp310-abi3-win_amd64.whl (435 kB)
Downloading pyreadline3-3.5.4-py3-none-any.whl (83 kB)
Downloading uvicorn-0.36.0-py3-none-any.whl (67 kB)
Downloading click-8.3.0-py3-none-any.whl (107 kB)
Downloading h11-0.16.0-py3-none-any.whl (37 kB)
Downloading websockets-15.0.1-cp312-cp312-win_amd64.whl (176 kB)
Installing collected packages: flatbuffers, websockets, threadpoolctl, soxr, sniffio, pyreadline3, protobuf, platformdirs, msgpack, lazy_loader, joblib, h11, decorator, ctranslate2, click, audioread, uvicorn, scikit-learn, pooch, humanfriendly, anyio, starlette, librosa, coloredlogs, onnxruntime, fastapi, faster-whisper, whisperlivekit
Successfully installed anyio-4.10.0 audioread-3.0.1 click-8.3.0 coloredlogs-15.0.1 ctranslate2-4.6.0 decorator-5.2.1 fastapi-0.117.1 faster-whisper-1.2.0 flatbuffers-25.2.10 h11-0.16.0 humanfriendly-10.0 joblib-1.5.2 lazy_loader-0.4 librosa-0.11.0 msgpack-1.1.1 onnxruntime-1.22.1 platformdirs-4.4.0 pooch-1.8.2 protobuf-6.32.1 pyreadline3-3.5.4 scikit-learn-1.7.2 sniffio-1.3.1 soxr-1.0.0 starlette-0.48.0 threadpoolctl-3.6.0 uvicorn-0.36.0 websockets-15.0.1 whisperlivekit-0.2.9

全部安装完成,没有报错。

运行

开始运行,根据你使用的模型大小(下面的例子都是base模型),下面日志,有2次运行命令,

第一次输入错误的语言参数代号,中文我写了ch,应该为zh,我们先用英文测试,参数代号en

复制代码
(whisperlivekit) PS C:\Users\Jacob> whisperlivekit-server --model base --language ch
INFO:     Started server process [7696]
INFO:     Waiting for application startup.
WARNING:whisperlivekit.basic_server:
==================================================
WhisperLiveKit 0.2.8 has introduced a new fast encoder feature using MLX Whisper or Faster Whisper for improved speed. Use --disable-fast-encoder to disable if you encounter issues.
==================================================

C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\torch\hub.py:330: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour
  warnings.warn(
Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to C:\Users\Jacob/.cache\torch\hub\master.zip
WARNING:whisperlivekit.simul_whisper.backend:
SimulStreaming backend is dual-licensed:
• Non-Commercial Use: PolyForm Noncommercial License 1.0.0.
• Commercial Use: Check SimulStreaming README (github.com/ufal/SimulStreaming) for more details.

Simulstreaming will use Faster Whisper for the encoder.
config.json: 2.31kB [00:00, 11.4kB/s]
C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\huggingface_hub\file_download.py:143: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\Jacob\.cache\huggingface\hub\models--Systran--faster-whisper-base. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
vocabulary.txt: 460kB [00:00, 637kB/s]
tokenizer.json: 0.00B [00:00, ?B/s]   Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
WARNING:huggingface_hub.file_download:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
tokenizer.json: 2.20MB [00:01, 1.75MB/s]
model.bin: 100%|████████████████████████████████████████████████████████████████████| 145M/145M [00:11<00:00, 12.9MB/s]
100%|███████████████████████████████████████| 139M/139M [00:10<00:00, 13.9MiB/s]
Downloading warmup file from https://github.com/ggerganov/whisper.cpp/raw/master/samples/jfk.wav
ERROR:    Traceback (most recent call last):
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\starlette\routing.py", line 694, in lifespan
    async with self.lifespan_context(app) as maybe_state:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\basic_server.py", line 32, in lifespan
    transcription_engine = TranscriptionEngine(
                           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\core.py", line 113, in __init__
    self.asr = SimulStreamingASR(
               ^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\backend.py", line 312, in __init__
    self.models = [self.load_model() for i in range(self.preload_model_count)]
                   ^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\backend.py", line 321, in load_model
    temp_model = PaddedAlignAttWhisper(
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\simul_whisper.py", line 80, in __init__
    self.create_tokenizer(cfg.language if cfg.language != "auto" else None)
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\simul_whisper.py", line 190, in create_tokenizer
    self.tokenizer = tokenizer.get_tokenizer(
                     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\whisper\tokenizer.py", line 380, in get_tokenizer
    raise ValueError(f"Unsupported language: {language}")
ValueError: Unsupported language: ch

ERROR:    Application startup failed. Exiting.(whisperlivekit) PS C:\Users\Jacob> whisperlivekit-server --model base --language en
INFO:     Started server process [2764]
INFO:     Waiting for application startup.
WARNING:whisperlivekit.basic_server:
==================================================
WhisperLiveKit 0.2.8 has introduced a new fast encoder feature using MLX Whisper or Faster Whisper for improved speed. Use --disable-fast-encoder to disable if you encounter issues.
==================================================

Using cache found in C:\Users\Jacob/.cache\torch\hub\snakers4_silero-vad_master
WARNING:whisperlivekit.simul_whisper.backend:
SimulStreaming backend is dual-licensed:
• Non-Commercial Use: PolyForm Noncommercial License 1.0.0.
• Commercial Use: Check SimulStreaming README (github.com/ufal/SimulStreaming) for more details.

Simulstreaming will use Faster Whisper for the encoder.
C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\whisper\timing.py:42: UserWarning: Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower median kernel implementation...
  warnings.warn(
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
INFO:     ::1:50387 - "GET / HTTP/1.1" 200 OK
INFO:     ::1:50387 - "GET /favicon.ico HTTP/1.1" 404 Not Found
INFO:     ::1:51078 - "GET / HTTP/1.1" 200 OK
INFO:     ::1:51084 - "WebSocket /asr" [accepted]
INFO:whisperlivekit.basic_server:WebSocket connection opened.
INFO:     connection open
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.60s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.55s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.49s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.55s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.49s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.55s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=2.17s | + Silence of = 2.16s | last_end = 2.12 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=2.17s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.01s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.01s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=3.79s | + Silence of = 4.32s | last_end = 6.64 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=3.79s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.49s | + Silence of = 1.56s | last_end = 12.88 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.49s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=1.09s | + Silence of = 2.64s | last_end = 12.88 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=1.09s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s | + Silence of = 1.56s | last_end = 20.020000000000003 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s | + Silence of = 1.08s | last_end = 24.1 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |

首次运行会下载缓存文件,查看log日志

Based on the log, the program first downloaded the following components during its initial run:

  • Silero-VAD: A voice activity detection model from a GitHub repository.
  • Faster-Whisper-Base Model: A more efficient version of the Whisper model for transcription, including its associated files:
    • config.json
    • vocabulary.txt
    • tokenizer.json
    • model.bin
  • Warmup file: A sample audio file (jfk.wav) from the whisper.cpp repository, used to test and initialize the model.

下载的文件主要保存在两个地方:

  • Hugging Face 缓存目录

大部分模型文件,比如 Faster-Whisper-Baseconfig.jsonvocabulary.txttokenizer.jsonmodel.bin 都被下载到了 Hugging Face 的默认缓存目录。

  • 路径: C:\Users\Jacob\.cache\huggingface\hub

这个缓存目录通常用于存储从 Hugging Face Hub 下载的模型和数据集。

  • Torch Hub 缓存目录

Silero-VAD 模型是从 GitHub 下载的,它被保存在 Torch Hub 的缓存目录中。

  • 路径: C:\Users\Jacob\.cache\torch\hub\master.zip

启动:whisperlivekit-server --model base --language en,使用标准模型,识别en英语;

打开网站:http://localhost:8000/

可以看到,它使用websocket和后台backend进行通信。是一个比较简单simple的一个前端页面。说话人有解析,它会把结果分成段。没有识别说话人(默认参数)

为了进一步测试,我们需要对模型服务进行配置。

模型配置

我们需要了解命令的每一个参数,可以先看https://github.com/QuentinFuxa/WhisperLiveKit的介绍。

其中,关于模型大小的配置,提供了很多模型,中低端的模型只有en语种

https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/available_models.md

其中,关于语种的参数如下

https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/simul_whisper/whisper/tokenizer.py

总结

  • 项目成熟度不高,不过属于快速开发过程中,可以看release日志https://github.com/QuentinFuxa/WhisperLiveKit/releases
  • 前端界面比较简洁,近作示例,可以结合业务场景,自行开发前端,github中有前段示例
  • 说话人识别默认不生效,需要配置打开(whisperlivekit-server --model base --language en --diarization),试了下英文男女博客还是比较容易区分的,试了中文的2个女记者,基本无法识别说话人了。
  • 实时转录延迟大约1s左右
  • 准确率看了下,个别常见词也会出错(相近的发音单词)
  • 中文识别会默认输出繁体字,识别语言不配置,会自动识别出英文中文,并自动转录;
  • 资源占用:RTX5090 GPU占用在10%左右;GPU内存占用20GB;

第一次运行base模型的运行跑了10分钟以上,比较稳定

第二/三次运行,GPU内存9GB到20GB,再到30GB左右,跑了一分钟或者几分钟左右,GPU到了100%,开始大面积丢识别结果,功能异常。

相关推荐
2501_9307992421 小时前
访答知识库#Pdf转word#人工智能#Al编辑器#访答RAG#企业知识库私,私有知识库,访答编辑器,个人知识库,云知识库,……
人工智能
学好statistics和DS1 天前
【CV】神经网络中哪些参数需要被学习?
人工智能·神经网络·学习
大千AI助手1 天前
机器学习特征筛选中的IV值详解:原理、应用与实现
人工智能·机器学习·kl散度·roc·iv·信息值·woe
姜—姜1 天前
通过构建神经网络实现项目预测
人工智能·pytorch·深度学习·神经网络
Ro Jace1 天前
模式识别与机器学习课程笔记(4):线性判决函数
人工智能·笔记·机器学习
科研小白_1 天前
基于遗传算法优化BP神经网络(GA-BP)的数据时序预测
人工智能·算法·回归
互联网江湖1 天前
蓝桥杯出局,少儿编程的价值祛魅时刻?
人工智能·生活
Elastic 中国社区官方博客1 天前
根据用户行为数据中的判断列表在 Elasticsearch 中训练 LTR 模型
大数据·数据库·人工智能·elasticsearch·搜索引擎·ai·全文检索
paid槮1 天前
OpenCV图像形态学详解
人工智能·opencv·计算机视觉
点控云1 天前
点控云智能短信:重构企业与用户的连接,让品牌沟通更高效
大数据·人工智能·科技·重构·外呼系统·呼叫中心