项目介绍
项目地址:https://github.com/QuentinFuxa/WhisperLiveKit
本文旨在快速上手,搭建环境,做下模型服务的功能学习和简单主观评测。
- 关键词:转录transcription,说话人分离diarization,机器翻译translation,语音活动检测vad
- 目的:环境搭建,快速上手,主观快速评测
- 难度:中;
WhisperLiveKit是一个实施转录工具,那为什么不直接使用Whisper呢。
从应用场景上先看,会议/直播/在线教育等这些场景,需要实时输出转录结果(就需要对小窗口的录音进行转录),甚至要进行多人说话时,说话人识别,还可能需要实时翻译同传。
而,Whisper 是为完整语句设计的,而不是实时片段。处理小片段会丢失上下文,截断音节中的单词,并产生糟糕的转录。WhisperLiveKit 使用最先进的同步语音研究进行智能缓冲和增量处理。
WhisperLiveKit基于历史上的若干Research的基础上,进行开发设计,包括:
- SimulStreaming (SOTA 2025) - Ultra-low latency transcription using AlignAtt policy
SimulStreaming (SOTA 2025) - 使用 AlignAtt 策略实现超低延迟转录 - NLLB, (distilled) (2024) - Translation to more than 100 languages.
NLLB,(精简版) (2024) - 翻译超过 100 种语言。 - WhisperStreaming (SOTA 2023) - Low latency transcription using LocalAgreement policy
WhisperStreaming (SOTA 2023) - 使用 LocalAgreement 策略的低延迟转录 - Streaming Sortformer (SOTA 2025) - Advanced real-time speaker diarization
流式 Sortformer(SOTA 2025)- 高级实时说话人分割 - Diart (SOTA 2021) - Real-time speaker diarization
Diart(SOTA 2021)- 实时说话人分割 - Silero VAD (2024) - Enterprise-grade Voice Activity Detection
Silero VAD(2024)- 企业级语音活动检测
这是项目的架构图,支持多用户并发

下面我们进行安装部署,作下上手简单评测
安装运行
环境部署
使用conda 创建隔离运行环境。考虑到我这边是RTX5090显卡+匹配的torch版本,我这边还是基于之前的whipser环境进行复制,生成新的conda环境。
(base) PS C:\Users\Jacob> conda env list
# conda environments:
#
base * C:\Users\Jacob\miniconda3
fireredtts C:\Users\Jacob\miniconda3\envs\fireredtts
pytorch_nightly_env C:\Users\Jacob\miniconda3\envs\pytorch_nightly_env
qwen_rtx5090 C:\Users\Jacob\miniconda3\envs\qwen_rtx5090
rtx50_comfyui C:\Users\Jacob\miniconda3\envs\rtx50_comfyui
whisper C:\Users\Jacob\miniconda3\envs\whisper
(base) PS C:\Users\Jacob> conda
usage: conda-script.py [-h] [-v] [--no-plugins] [-V] COMMAND ...
conda is a tool for managing and deploying applications, environments and packages.
options:
-h, --help Show this help message and exit.
-v, --verbose Can be used multiple times. Once for detailed output, twice for INFO logging, thrice for DEBUG
logging, four times for TRACE logging.
--no-plugins Disable all plugins that are not built into conda.
-V, --version Show the conda version number and exit.
commands:
The following built-in and plugins subcommands are available.
COMMAND
activate Activate a conda environment.
clean Remove unused packages and caches.
commands List all available conda subcommands (including those from plugins). Generally only used by
tab-completion.
compare Compare packages between conda environments.
config Modify configuration values in .condarc.
content-trust Signing and verification tools for Conda
create Create a new conda environment from a list of specified packages.
deactivate Deactivate the current active conda environment.
doctor Display a health report for your environment.
env Create and manage conda environments.
export Export a given environment
info Display information about current conda install.
init Initialize conda for shell interaction.
install Install a list of packages into a specified conda environment.
list List installed packages in a conda environment.
notices Retrieve latest channel notifications.
package Create low-level conda packages. (EXPERIMENTAL)
remove (uninstall) Remove a list of packages from a specified conda environment.
rename Rename an existing environment.
repoquery Advanced search for repodata.
run Run an executable in a conda environment.
search Search for packages and display associated information using the MatchSpec format.
tos A subcommand for viewing, accepting, rejecting, and otherwise interacting with a channel's
Terms of Service (ToS). This plugin periodically checks for updated Terms of Service for the
active/selected channels. Channels with a Terms of Service will need to be accepted or
rejected prior to use. Conda will only allow package installation from channels without a
Terms of Service or with an accepted Terms of Service. Attempting to use a channel with a
rejected Terms of Service will result in an error.
update (upgrade) Update conda packages to the latest compatible version.
(base) PS C:\Users\Jacob> conda create -n whisperlivekit --clone whisper
3 channel Terms of Service accepted
Retrieving notices: done
Source: C:\Users\Jacob\miniconda3\envs\whisper
Destination: C:\Users\Jacob\miniconda3\envs\whisperlivekit
Packages: 19
Files: 35845
Downloading and Extracting Packages:
## Package Plan ##
environment location: C:\Users\Jacob\miniconda3\envs\whisperlivekit
added / updated specs:
- conda-forge/noarch::ca-certificates==2025.8.3=h4c7d964_0
- conda-forge/win-64::ffmpeg==4.3.1=ha925a31_0
- conda-forge/win-64::openssl==3.5.2=h725018a_0
- defaults/noarch::pip==25.1=pyhc872135_2
- defaults/noarch::tzdata==2025b=h04d1e81_0
- defaults/win-64::bzip2==1.0.8=h2bbff1b_6
- defaults/win-64::expat==2.7.1=h8ddb27b_0
- defaults/win-64::libffi==3.4.4=hd77b12b_1
- defaults/win-64::python==3.12.11=h716150d_0
- defaults/win-64::setuptools==78.1.1=py312haa95532_0
- defaults/win-64::sqlite==3.50.2=hda9a48d_1
- defaults/win-64::tk==8.6.14=h5e9d12e_1
- defaults/win-64::ucrt==10.0.22621.0=haa95532_0
- defaults/win-64::vc14_runtime==14.44.35208=h4927774_10
- defaults/win-64::vc==14.3=h2df5915_10
- defaults/win-64::vs2015_runtime==14.44.35208=ha6b5a95_10
- defaults/win-64::wheel==0.45.1=py312haa95532_0
- defaults/win-64::xz==5.6.4=h4754444_1
- defaults/win-64::zlib==1.2.13=h8cc25b3_1
done
#
# To activate this environment, use
#
# $ conda activate whisperlivekit
#
# To deactivate an active environment, use
#
# $ conda deactivate
(base) PS C:\Users\Jacob> conda activate whisperlivekit
(whisperlivekit) PS C:\Users\Jacob> pip install whisperlivekit
Collecting whisperlivekit
Downloading whisperlivekit-0.2.9-py3-none-any.whl.metadata (18 kB)
Collecting fastapi (from whisperlivekit)
Downloading fastapi-0.117.1-py3-none-any.whl.metadata (28 kB)
Collecting librosa (from whisperlivekit)
Downloading librosa-0.11.0-py3-none-any.whl.metadata (8.7 kB)
Requirement already satisfied: soundfile in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (0.13.1)
Collecting faster-whisper (from whisperlivekit)
Downloading faster_whisper-1.2.0-py3-none-any.whl.metadata (16 kB)
Collecting uvicorn (from whisperlivekit)
Downloading uvicorn-0.36.0-py3-none-any.whl.metadata (6.6 kB)
Collecting websockets (from whisperlivekit)
Downloading websockets-15.0.1-cp312-cp312-win_amd64.whl.metadata (7.0 kB)
Requirement already satisfied: torchaudio>=2.0.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (2.8.0.dev20250810+cu128)
Requirement already satisfied: torch>=2.0.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (2.9.0.dev20250810+cu128)
Requirement already satisfied: tqdm in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (4.67.1)
Requirement already satisfied: tiktoken in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (0.11.0)
Requirement already satisfied: filelock in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.18.0)
Requirement already satisfied: typing-extensions>=4.10.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (4.14.1)
Requirement already satisfied: sympy>=1.13.3 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.5)
Requirement already satisfied: jinja2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.1.6)
Requirement already satisfied: fsspec>=0.8.5 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (2025.7.0)
Requirement already satisfied: setuptools in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (78.1.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from sympy>=1.13.3->torch>=2.0.0->whisperlivekit) (1.3.0)
Collecting starlette<0.49.0,>=0.40.0 (from fastapi->whisperlivekit)
Downloading starlette-0.48.0-py3-none-any.whl.metadata (6.3 kB)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from fastapi->whisperlivekit) (2.11.7)
Requirement already satisfied: annotated-types>=0.6.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (2.33.2)
Requirement already satisfied: typing-inspection>=0.4.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (0.4.1)
Collecting anyio<5,>=3.6.2 (from starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit)
Downloading anyio-4.10.0-py3-none-any.whl.metadata (4.0 kB)
Requirement already satisfied: idna>=2.8 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from anyio<5,>=3.6.2->starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit) (3.10)
Collecting sniffio>=1.1 (from anyio<5,>=3.6.2->starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit)
Downloading sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting ctranslate2<5,>=4.0 (from faster-whisper->whisperlivekit)
Downloading ctranslate2-4.6.0-cp312-cp312-win_amd64.whl.metadata (10 kB)
Requirement already satisfied: huggingface-hub>=0.13 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (0.34.4)
Requirement already satisfied: tokenizers<1,>=0.13 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (0.21.4)
Collecting onnxruntime<2,>=1.14 (from faster-whisper->whisperlivekit)
Downloading onnxruntime-1.22.1-cp312-cp312-win_amd64.whl.metadata (5.1 kB)
Requirement already satisfied: av>=11 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (15.0.0)
Requirement already satisfied: numpy in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from ctranslate2<5,>=4.0->faster-whisper->whisperlivekit) (2.2.6)
Requirement already satisfied: pyyaml<7,>=5.3 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from ctranslate2<5,>=4.0->faster-whisper->whisperlivekit) (6.0.2)
Collecting coloredlogs (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting flatbuffers (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
Downloading flatbuffers-25.2.10-py2.py3-none-any.whl.metadata (875 bytes)
Requirement already satisfied: packaging in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit) (25.0)
Collecting protobuf (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
Downloading protobuf-6.32.1-cp310-abi3-win_amd64.whl.metadata (593 bytes)
Requirement already satisfied: requests in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2.32.4)
Requirement already satisfied: colorama in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from tqdm->whisperlivekit) (0.4.6)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Collecting pyreadline3 (from humanfriendly>=9.1->coloredlogs->onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
Downloading pyreadline3-3.5.4-py3-none-any.whl.metadata (4.7 kB)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from jinja2->torch>=2.0.0->whisperlivekit) (3.0.2)
Collecting audioread>=2.1.9 (from librosa->whisperlivekit)
Using cached audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Requirement already satisfied: numba>=0.51.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from librosa->whisperlivekit) (0.61.2)
Requirement already satisfied: scipy>=1.6.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from librosa->whisperlivekit) (1.16.1)
Collecting scikit-learn>=1.1.0 (from librosa->whisperlivekit)
Using cached scikit_learn-1.7.2-cp312-cp312-win_amd64.whl.metadata (11 kB)
Collecting joblib>=1.0 (from librosa->whisperlivekit)
Using cached joblib-1.5.2-py3-none-any.whl.metadata (5.6 kB)
Collecting decorator>=4.3.0 (from librosa->whisperlivekit)
Using cached decorator-5.2.1-py3-none-any.whl.metadata (3.9 kB)
Collecting pooch>=1.1 (from librosa->whisperlivekit)
Using cached pooch-1.8.2-py3-none-any.whl.metadata (10 kB)
Collecting soxr>=0.3.2 (from librosa->whisperlivekit)
Using cached soxr-1.0.0-cp312-abi3-win_amd64.whl.metadata (5.6 kB)
Collecting lazy_loader>=0.1 (from librosa->whisperlivekit)
Using cached lazy_loader-0.4-py3-none-any.whl.metadata (7.6 kB)
Collecting msgpack>=1.0 (from librosa->whisperlivekit)
Using cached msgpack-1.1.1-cp312-cp312-win_amd64.whl.metadata (8.6 kB)
Requirement already satisfied: llvmlite<0.45,>=0.44.0dev0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from numba>=0.51.0->librosa->whisperlivekit) (0.44.0)
Collecting platformdirs>=2.5.0 (from pooch>=1.1->librosa->whisperlivekit)
Using cached platformdirs-4.4.0-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: charset_normalizer<4,>=2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (3.4.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2025.8.3)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=1.1.0->librosa->whisperlivekit)
Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: cffi>=1.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from soundfile->whisperlivekit) (1.17.1)
Requirement already satisfied: pycparser in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from cffi>=1.0->soundfile->whisperlivekit) (2.22)
Requirement already satisfied: regex>=2022.1.18 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from tiktoken->whisperlivekit) (2025.7.34)
Collecting click>=7.0 (from uvicorn->whisperlivekit)
Downloading click-8.3.0-py3-none-any.whl.metadata (2.6 kB)
Collecting h11>=0.8 (from uvicorn->whisperlivekit)
Downloading h11-0.16.0-py3-none-any.whl.metadata (8.3 kB)
Downloading whisperlivekit-0.2.9-py3-none-any.whl (876 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 876.5/876.5 kB 163.8 kB/s eta 0:00:00
Downloading fastapi-0.117.1-py3-none-any.whl (95 kB)
Downloading starlette-0.48.0-py3-none-any.whl (73 kB)
Downloading anyio-4.10.0-py3-none-any.whl (107 kB)
Downloading sniffio-1.3.1-py3-none-any.whl (10 kB)
Downloading faster_whisper-1.2.0-py3-none-any.whl (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 124.6 kB/s eta 0:00:00
Downloading ctranslate2-4.6.0-cp312-cp312-win_amd64.whl (19.5 MB)
━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.2/19.5 MB 124.3 kB/s eta 0:01:55
ERROR: Operation cancelled by user
(whisperlivekit) PS C:\Users\Jacob> pip install whisperlivekit
Collecting whisperlivekit
Using cached whisperlivekit-0.2.9-py3-none-any.whl.metadata (18 kB)
Collecting fastapi (from whisperlivekit)
Using cached fastapi-0.117.1-py3-none-any.whl.metadata (28 kB)
Collecting librosa (from whisperlivekit)
Using cached librosa-0.11.0-py3-none-any.whl.metadata (8.7 kB)
Requirement already satisfied: soundfile in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (0.13.1)
Collecting faster-whisper (from whisperlivekit)
Using cached faster_whisper-1.2.0-py3-none-any.whl.metadata (16 kB)
Collecting uvicorn (from whisperlivekit)
Using cached uvicorn-0.36.0-py3-none-any.whl.metadata (6.6 kB)
Collecting websockets (from whisperlivekit)
Using cached websockets-15.0.1-cp312-cp312-win_amd64.whl.metadata (7.0 kB)
Requirement already satisfied: torchaudio>=2.0.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (2.8.0.dev20250810+cu128)
Requirement already satisfied: torch>=2.0.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (2.9.0.dev20250810+cu128)
Requirement already satisfied: tqdm in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (4.67.1)
Requirement already satisfied: tiktoken in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from whisperlivekit) (0.11.0)
Requirement already satisfied: filelock in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.18.0)
Requirement already satisfied: typing-extensions>=4.10.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (4.14.1)
Requirement already satisfied: sympy>=1.13.3 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.5)
Requirement already satisfied: jinja2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (3.1.6)
Requirement already satisfied: fsspec>=0.8.5 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (2025.7.0)
Requirement already satisfied: setuptools in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from torch>=2.0.0->whisperlivekit) (78.1.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from sympy>=1.13.3->torch>=2.0.0->whisperlivekit) (1.3.0)
Collecting starlette<0.49.0,>=0.40.0 (from fastapi->whisperlivekit)
Using cached starlette-0.48.0-py3-none-any.whl.metadata (6.3 kB)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from fastapi->whisperlivekit) (2.11.7)
Requirement already satisfied: annotated-types>=0.6.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (2.33.2)
Requirement already satisfied: typing-inspection>=0.4.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi->whisperlivekit) (0.4.1)
Collecting anyio<5,>=3.6.2 (from starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit)
Using cached anyio-4.10.0-py3-none-any.whl.metadata (4.0 kB)
Requirement already satisfied: idna>=2.8 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from anyio<5,>=3.6.2->starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit) (3.10)
Collecting sniffio>=1.1 (from anyio<5,>=3.6.2->starlette<0.49.0,>=0.40.0->fastapi->whisperlivekit)
Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting ctranslate2<5,>=4.0 (from faster-whisper->whisperlivekit)
Using cached ctranslate2-4.6.0-cp312-cp312-win_amd64.whl.metadata (10 kB)
Requirement already satisfied: huggingface-hub>=0.13 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (0.34.4)
Requirement already satisfied: tokenizers<1,>=0.13 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (0.21.4)
Collecting onnxruntime<2,>=1.14 (from faster-whisper->whisperlivekit)
Using cached onnxruntime-1.22.1-cp312-cp312-win_amd64.whl.metadata (5.1 kB)
Requirement already satisfied: av>=11 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from faster-whisper->whisperlivekit) (15.0.0)
Requirement already satisfied: numpy in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from ctranslate2<5,>=4.0->faster-whisper->whisperlivekit) (2.2.6)
Requirement already satisfied: pyyaml<7,>=5.3 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from ctranslate2<5,>=4.0->faster-whisper->whisperlivekit) (6.0.2)
Collecting coloredlogs (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
Using cached coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting flatbuffers (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
Using cached flatbuffers-25.2.10-py2.py3-none-any.whl.metadata (875 bytes)
Requirement already satisfied: packaging in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit) (25.0)
Collecting protobuf (from onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
Using cached protobuf-6.32.1-cp310-abi3-win_amd64.whl.metadata (593 bytes)
Requirement already satisfied: requests in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2.32.4)
Requirement already satisfied: colorama in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from tqdm->whisperlivekit) (0.4.6)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
Using cached humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Collecting pyreadline3 (from humanfriendly>=9.1->coloredlogs->onnxruntime<2,>=1.14->faster-whisper->whisperlivekit)
Using cached pyreadline3-3.5.4-py3-none-any.whl.metadata (4.7 kB)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from jinja2->torch>=2.0.0->whisperlivekit) (3.0.2)
Collecting audioread>=2.1.9 (from librosa->whisperlivekit)
Using cached audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Requirement already satisfied: numba>=0.51.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from librosa->whisperlivekit) (0.61.2)
Requirement already satisfied: scipy>=1.6.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from librosa->whisperlivekit) (1.16.1)
Collecting scikit-learn>=1.1.0 (from librosa->whisperlivekit)
Using cached scikit_learn-1.7.2-cp312-cp312-win_amd64.whl.metadata (11 kB)
Collecting joblib>=1.0 (from librosa->whisperlivekit)
Using cached joblib-1.5.2-py3-none-any.whl.metadata (5.6 kB)
Collecting decorator>=4.3.0 (from librosa->whisperlivekit)
Using cached decorator-5.2.1-py3-none-any.whl.metadata (3.9 kB)
Collecting pooch>=1.1 (from librosa->whisperlivekit)
Using cached pooch-1.8.2-py3-none-any.whl.metadata (10 kB)
Collecting soxr>=0.3.2 (from librosa->whisperlivekit)
Using cached soxr-1.0.0-cp312-abi3-win_amd64.whl.metadata (5.6 kB)
Collecting lazy_loader>=0.1 (from librosa->whisperlivekit)
Using cached lazy_loader-0.4-py3-none-any.whl.metadata (7.6 kB)
Collecting msgpack>=1.0 (from librosa->whisperlivekit)
Using cached msgpack-1.1.1-cp312-cp312-win_amd64.whl.metadata (8.6 kB)
Requirement already satisfied: llvmlite<0.45,>=0.44.0dev0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from numba>=0.51.0->librosa->whisperlivekit) (0.44.0)
Collecting platformdirs>=2.5.0 (from pooch>=1.1->librosa->whisperlivekit)
Using cached platformdirs-4.4.0-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: charset_normalizer<4,>=2 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (3.4.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from requests->huggingface-hub>=0.13->faster-whisper->whisperlivekit) (2025.8.3)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=1.1.0->librosa->whisperlivekit)
Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: cffi>=1.0 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from soundfile->whisperlivekit) (1.17.1)
Requirement already satisfied: pycparser in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from cffi>=1.0->soundfile->whisperlivekit) (2.22)
Requirement already satisfied: regex>=2022.1.18 in c:\users\jacob\miniconda3\envs\whisperlivekit\lib\site-packages (from tiktoken->whisperlivekit) (2025.7.34)
Collecting click>=7.0 (from uvicorn->whisperlivekit)
Using cached click-8.3.0-py3-none-any.whl.metadata (2.6 kB)
Collecting h11>=0.8 (from uvicorn->whisperlivekit)
Using cached h11-0.16.0-py3-none-any.whl.metadata (8.3 kB)
Using cached whisperlivekit-0.2.9-py3-none-any.whl (876 kB)
Using cached fastapi-0.117.1-py3-none-any.whl (95 kB)
Using cached starlette-0.48.0-py3-none-any.whl (73 kB)
Using cached anyio-4.10.0-py3-none-any.whl (107 kB)
Using cached sniffio-1.3.1-py3-none-any.whl (10 kB)
Using cached faster_whisper-1.2.0-py3-none-any.whl (1.1 MB)
Downloading ctranslate2-4.6.0-cp312-cp312-win_amd64.whl (19.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.5/19.5 MB 8.4 MB/s eta 0:00:00
Downloading onnxruntime-1.22.1-cp312-cp312-win_amd64.whl (12.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.7/12.7 MB 12.0 MB/s eta 0:00:00
Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Downloading flatbuffers-25.2.10-py2.py3-none-any.whl (30 kB)
Downloading librosa-0.11.0-py3-none-any.whl (260 kB)
Using cached audioread-3.0.1-py3-none-any.whl (23 kB)
Using cached decorator-5.2.1-py3-none-any.whl (9.2 kB)
Using cached joblib-1.5.2-py3-none-any.whl (308 kB)
Using cached lazy_loader-0.4-py3-none-any.whl (12 kB)
Using cached msgpack-1.1.1-cp312-cp312-win_amd64.whl (72 kB)
Using cached pooch-1.8.2-py3-none-any.whl (64 kB)
Using cached platformdirs-4.4.0-py3-none-any.whl (18 kB)
Using cached scikit_learn-1.7.2-cp312-cp312-win_amd64.whl (8.7 MB)
Using cached soxr-1.0.0-cp312-abi3-win_amd64.whl (172 kB)
Using cached threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Downloading protobuf-6.32.1-cp310-abi3-win_amd64.whl (435 kB)
Downloading pyreadline3-3.5.4-py3-none-any.whl (83 kB)
Downloading uvicorn-0.36.0-py3-none-any.whl (67 kB)
Downloading click-8.3.0-py3-none-any.whl (107 kB)
Downloading h11-0.16.0-py3-none-any.whl (37 kB)
Downloading websockets-15.0.1-cp312-cp312-win_amd64.whl (176 kB)
Installing collected packages: flatbuffers, websockets, threadpoolctl, soxr, sniffio, pyreadline3, protobuf, platformdirs, msgpack, lazy_loader, joblib, h11, decorator, ctranslate2, click, audioread, uvicorn, scikit-learn, pooch, humanfriendly, anyio, starlette, librosa, coloredlogs, onnxruntime, fastapi, faster-whisper, whisperlivekit
Successfully installed anyio-4.10.0 audioread-3.0.1 click-8.3.0 coloredlogs-15.0.1 ctranslate2-4.6.0 decorator-5.2.1 fastapi-0.117.1 faster-whisper-1.2.0 flatbuffers-25.2.10 h11-0.16.0 humanfriendly-10.0 joblib-1.5.2 lazy_loader-0.4 librosa-0.11.0 msgpack-1.1.1 onnxruntime-1.22.1 platformdirs-4.4.0 pooch-1.8.2 protobuf-6.32.1 pyreadline3-3.5.4 scikit-learn-1.7.2 sniffio-1.3.1 soxr-1.0.0 starlette-0.48.0 threadpoolctl-3.6.0 uvicorn-0.36.0 websockets-15.0.1 whisperlivekit-0.2.9
全部安装完成,没有报错。
运行
开始运行,根据你使用的模型大小(下面的例子都是base模型),下面日志,有2次运行命令,
第一次输入错误的语言参数代号,中文我写了ch,应该为zh,我们先用英文测试,参数代号en
(whisperlivekit) PS C:\Users\Jacob> whisperlivekit-server --model base --language ch
INFO: Started server process [7696]
INFO: Waiting for application startup.
WARNING:whisperlivekit.basic_server:
==================================================
WhisperLiveKit 0.2.8 has introduced a new fast encoder feature using MLX Whisper or Faster Whisper for improved speed. Use --disable-fast-encoder to disable if you encounter issues.
==================================================
C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\torch\hub.py:330: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour
warnings.warn(
Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to C:\Users\Jacob/.cache\torch\hub\master.zip
WARNING:whisperlivekit.simul_whisper.backend:
SimulStreaming backend is dual-licensed:
• Non-Commercial Use: PolyForm Noncommercial License 1.0.0.
• Commercial Use: Check SimulStreaming README (github.com/ufal/SimulStreaming) for more details.
Simulstreaming will use Faster Whisper for the encoder.
config.json: 2.31kB [00:00, 11.4kB/s]
C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\huggingface_hub\file_download.py:143: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\Jacob\.cache\huggingface\hub\models--Systran--faster-whisper-base. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
warnings.warn(message)
vocabulary.txt: 460kB [00:00, 637kB/s]
tokenizer.json: 0.00B [00:00, ?B/s] Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
WARNING:huggingface_hub.file_download:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
tokenizer.json: 2.20MB [00:01, 1.75MB/s]
model.bin: 100%|████████████████████████████████████████████████████████████████████| 145M/145M [00:11<00:00, 12.9MB/s]
100%|███████████████████████████████████████| 139M/139M [00:10<00:00, 13.9MiB/s]
Downloading warmup file from https://github.com/ggerganov/whisper.cpp/raw/master/samples/jfk.wav
ERROR: Traceback (most recent call last):
File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\starlette\routing.py", line 694, in lifespan
async with self.lifespan_context(app) as maybe_state:
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\basic_server.py", line 32, in lifespan
transcription_engine = TranscriptionEngine(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\core.py", line 113, in __init__
self.asr = SimulStreamingASR(
^^^^^^^^^^^^^^^^^^
File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\backend.py", line 312, in __init__
self.models = [self.load_model() for i in range(self.preload_model_count)]
^^^^^^^^^^^^^^^^^
File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\backend.py", line 321, in load_model
temp_model = PaddedAlignAttWhisper(
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\simul_whisper.py", line 80, in __init__
self.create_tokenizer(cfg.language if cfg.language != "auto" else None)
File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\simul_whisper.py", line 190, in create_tokenizer
self.tokenizer = tokenizer.get_tokenizer(
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\whisper\tokenizer.py", line 380, in get_tokenizer
raise ValueError(f"Unsupported language: {language}")
ValueError: Unsupported language: ch
ERROR: Application startup failed. Exiting.(whisperlivekit) PS C:\Users\Jacob> whisperlivekit-server --model base --language en
INFO: Started server process [2764]
INFO: Waiting for application startup.
WARNING:whisperlivekit.basic_server:
==================================================
WhisperLiveKit 0.2.8 has introduced a new fast encoder feature using MLX Whisper or Faster Whisper for improved speed. Use --disable-fast-encoder to disable if you encounter issues.
==================================================
Using cache found in C:\Users\Jacob/.cache\torch\hub\snakers4_silero-vad_master
WARNING:whisperlivekit.simul_whisper.backend:
SimulStreaming backend is dual-licensed:
• Non-Commercial Use: PolyForm Noncommercial License 1.0.0.
• Commercial Use: Check SimulStreaming README (github.com/ufal/SimulStreaming) for more details.
Simulstreaming will use Faster Whisper for the encoder.
C:\Users\Jacob\miniconda3\envs\whisperlivekit\Lib\site-packages\whisperlivekit\simul_whisper\whisper\timing.py:42: UserWarning: Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower median kernel implementation...
warnings.warn(
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
INFO: ::1:50387 - "GET / HTTP/1.1" 200 OK
INFO: ::1:50387 - "GET /favicon.ico HTTP/1.1" 404 Not Found
INFO: ::1:51078 - "GET / HTTP/1.1" 200 OK
INFO: ::1:51084 - "WebSocket /asr" [accepted]
INFO:whisperlivekit.basic_server:WebSocket connection opened.
INFO: connection open
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.60s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.55s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.49s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.55s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.49s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.55s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=2.17s | + Silence of = 2.16s | last_end = 2.12 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=2.17s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.01s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.01s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=3.79s | + Silence of = 4.32s | last_end = 6.64 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=3.79s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.49s | + Silence of = 1.56s | last_end = 12.88 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.49s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=1.09s | + Silence of = 2.64s | last_end = 12.88 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=1.09s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s | + Silence of = 1.56s | last_end = 20.020000000000003 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s | + Silence of = 1.08s | last_end = 24.1 |
INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
首次运行会下载缓存文件,查看log日志
Based on the log, the program first downloaded the following components during its initial run:
- Silero-VAD: A voice activity detection model from a GitHub repository.
- Faster-Whisper-Base Model: A more efficient version of the Whisper model for transcription, including its associated files:
-
config.json
vocabulary.txt
tokenizer.json
model.bin
- Warmup file: A sample audio file (
jfk.wav
) from thewhisper.cpp
repository, used to test and initialize the model.
下载的文件主要保存在两个地方:
- Hugging Face 缓存目录
大部分模型文件,比如 Faster-Whisper-Base 的 config.json
、vocabulary.txt
、tokenizer.json
和 model.bin
都被下载到了 Hugging Face 的默认缓存目录。
- 路径:
C:\Users\Jacob\.cache\huggingface\hub
这个缓存目录通常用于存储从 Hugging Face Hub 下载的模型和数据集。
- Torch Hub 缓存目录
Silero-VAD 模型是从 GitHub 下载的,它被保存在 Torch Hub 的缓存目录中。
- 路径:
C:\Users\Jacob\.cache\torch\hub\master.zip
启动:whisperlivekit-server --model base --language en,使用标准模型,识别en英语;

可以看到,它使用websocket和后台backend进行通信。是一个比较简单simple的一个前端页面。说话人有解析,它会把结果分成段。没有识别说话人(默认参数)
为了进一步测试,我们需要对模型服务进行配置。
模型配置
我们需要了解命令的每一个参数,可以先看https://github.com/QuentinFuxa/WhisperLiveKit的介绍。
其中,关于模型大小的配置,提供了很多模型,中低端的模型只有en语种
https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/available_models.md

其中,关于语种的参数如下

总结
- 项目成熟度不高,不过属于快速开发过程中,可以看release日志https://github.com/QuentinFuxa/WhisperLiveKit/releases
- 前端界面比较简洁,近作示例,可以结合业务场景,自行开发前端,github中有前段示例
- 说话人识别默认不生效,需要配置打开(whisperlivekit-server --model base --language en --diarization),试了下英文男女博客还是比较容易区分的,试了中文的2个女记者,基本无法识别说话人了。

- 实时转录延迟大约1s左右
- 准确率看了下,个别常见词也会出错(相近的发音单词)
- 中文识别会默认输出繁体字,识别语言不配置,会自动识别出英文中文,并自动转录;
- 资源占用:RTX5090 GPU占用在10%左右;GPU内存占用20GB;

第一次运行base模型的运行跑了10分钟以上,比较稳定

第二/三次运行,GPU内存9GB到20GB,再到30GB左右,跑了一分钟或者几分钟左右,GPU到了100%,开始大面积丢识别结果,功能异常。