先上结论,刚开始没找到windows下怎么编译,所以不会装。
后来让DuMate帮着编译安装的。
官网:github.com
学习参考这篇文档:Whisper.cpp 编译使用_whisper c++-CSDN博客
High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model:
- Plain C/C++ implementation without dependencies
- Apple Silicon first-class citizen - optimized via ARM NEON, Accelerate framework, Metal and Core ML
- AVX intrinsics support for x86 architectures
- VSX intrinsics support for POWER architectures
- Mixed F16 / F32 precision
- Integer quantization support
- Zero memory allocations at runtime
- Vulkan support
- Support for CPU-only inference
- Efficient GPU support for NVIDIA
- OpenVINO Support
- Ascend NPU Support
- C-style API
Supported platforms:
- Mac OS (Intel and Arm)
- iOS
- Android
- Java
- Linux / FreeBSD
- WebAssembly
- Windows (MSVC and MinGW]
- Raspberry Pi
- Docker
支持Windows
实践
下载源代码
git clone https://github.com/ggerganov/whisper.cpp
进入目录
cd whisper.cpp
下载模型:
下载下载指定模型,下载好之后,它们存储在 models 文件夹下。
.\models\download-ggml-model.cmd small
下载模型
models\download-ggml-model.cmd base.en
为了编译
如果是Linux下,直接cmake即可,但是Windows下,就有些麻烦了:
先安装choco ,注意,需要使用admin权限
@powershell -NoProfile -ExecutionPolicy Bypass -Command "iex ((new-object net.webclient).DownloadString('https://chocolatey.org/install.ps1'))" && SET PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin
然后就可以使用choco 来安装各种软件,比如make
choco install make
编译
cmake -B build
cmake --build build --config Release
cmake -B build
cmake --build build --config Release
好吧,windows下没找到怎么编译...
编译完成后,有如下文件:
E:\github\whisper.cpp\build\bin\Release 的目录
2025/02/05 22:36 <DIR> .
2025/02/05 22:36 <DIR> ..
2025/02/05 22:36 16,896 bench.exe
2025/02/05 22:36 16,896 command.exe
2025/02/05 22:36 479,232 ggml-base.dll
2025/02/05 22:36 319,488 ggml-cpu.dll
2025/02/05 22:36 64,512 ggml.dll
2025/02/05 22:36 16,896 main.exe
2025/02/05 22:36 16,896 stream.exe
再次编译,有了
whisper-cli文件,测试
E:\github\whisper.cpp>build\bin\Release\whisper-cli -f samples\jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_init_from_file_with_params_no_state: failed to open 'models/ggml-base.en.bin'
error: failed to initialize whisper context
Docker
Prerequisites
- Docker must be installed and running on your system.
- Create a folder to store big models & intermediate files (ex. /whisper/models)
Images
We have two Docker images available for this project:
ghcr.io/ggerganov/whisper.cpp:main: This image includes the main executable file as well ascurlandffmpeg. (platforms:linux/amd64,linux/arm64)ghcr.io/ggerganov/whisper.cpp:main-cuda: Same asmainbut compiled with CUDA support. (platforms:linux/amd64)
Usage
# download model and persist it in a local folder
docker run -it --rm \
-v path/to/models:/models \
whisper.cpp:main "./models/download-ggml-model.sh base /models"
# transcribe an audio file
docker run -it --rm \
-v path/to/models:/models \
-v path/to/audios:/audios \
whisper.cpp:main "./main -m /models/ggml-base.bin -f /audios/jfk.wav"
# transcribe an audio file in samples folder
docker run -it --rm \
-v path/to/models:/models \
whisper.cpp:main "./main -m /models/ggml-base.bin -f ./samples/jfk.w
尝试Windows下编译
自己的机器cpu比较老,Intel Xeon E5-2643 v2,是 Ivy Bridge 架构的处理器。这个处理器支持 AVX 但不支持 AVX2。官方预编译的二进制文件可能使用了 AVX2 指令,导致不能使用,所以需要自己编译。
刚开始不会编译,后来是让DuMate帮着完成的。
创建bash脚本编译
cd whisper.cpp && "C:\Program Files\CMake\bin\cmake.exe" -B build -G "Visual Studio 17 2022" -A x64 -DGGML_AVX2=OFF
编译完成
测试Whisper
注意默认用的是英文模型,所以只能识别英文语音:
C:\Users\Admin\.qianfan\workspace\4b0c1bb6fb7845d9b44d7b5bf76070e8>whisper.cpp\build\bin\Release\whisper-cli.exe -m whisper-install\models\ggml-base.en.bin -f E:\360Downloads\chinese.wav
输出
C:\Users\Admin\.qianfan\workspace\4b0c1bb6fb7845d9b44d7b5bf76070e8>./whisper.cpp/build/bin/Release/whisper-cli.exe -m ./whisper-install/models/ggml-base.en.bin -f ./whisper.cpp/samples/jfk.wav
'.' is not recognized as an internal or external command,
operable program or batch file.
C:\Users\Admin\.qianfan\workspace\4b0c1bb6fb7845d9b44d7b5bf76070e8>whisper.cpp\build\bin\Release\whisper-cli.exe -m whisper-install\models\ggml-base.en.bin -f E:\360Downloads\chinese.wav
whisper_init_from_file_with_params_no_state: loading model from 'whisper-install\models\ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_init_with_params_no_state: devices = 1
whisper_init_with_params_no_state: backends = 1
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_backend_init_gpu: device 0: CPU (type: 0)
whisper_backend_init_gpu: no GPU found
whisper_init_state: kv self size = 6.29 MB
whisper_init_state: kv cross size = 18.87 MB
whisper_init_state: kv pad size = 3.15 MB
whisper_init_state: compute buffer (conv) = 16.28 MB
whisper_init_state: compute buffer (encode) = 23.09 MB
whisper_init_state: compute buffer (cross) = 4.66 MB
whisper_init_state: compute buffer (decode) = 96.37 MB
system_info: n_threads = 4 / 24 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | OPENMP = 1 | REPACK = 1 |
main: processing 'E:\360Downloads\chinese.wav' (180480 samples, 11.3 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:03.660] (speaking in foreign language)
whisper_print_timings: load time = 579.79 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 17.93 ms
whisper_print_timings: sample time = 88.84 ms / 53 runs ( 1.68 ms per run)
whisper_print_timings: encode time = 17883.79 ms / 1 runs ( 17883.79 ms per run)
whisper_print_timings: decode time = 212.33 ms / 5 runs ( 42.47 ms per run)
whisper_print_timings: batchd time = 1707.05 ms / 44 runs ( 38.80 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 20612.66 ms
上面音频文件是中文,它不识别,换成英文就好了
C:\Users\Admin\.qianfan\workspace\4b0c1bb6fb7845d9b44d7b5bf76070e8>whisper.cpp\build\bin\Release\whisper-cli.exe -m whisper-install\models\ggml-base.en.bin -f ./whisper.cpp/samples/jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'whisper-install\models\ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_init_with_params_no_state: devices = 1
whisper_init_with_params_no_state: backends = 1
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_backend_init_gpu: device 0: CPU (type: 0)
whisper_backend_init_gpu: no GPU found
whisper_init_state: kv self size = 6.29 MB
whisper_init_state: kv cross size = 18.87 MB
whisper_init_state: kv pad size = 3.15 MB
whisper_init_state: compute buffer (conv) = 16.28 MB
whisper_init_state: compute buffer (encode) = 23.09 MB
whisper_init_state: compute buffer (cross) = 4.66 MB
whisper_init_state: compute buffer (decode) = 96.37 MB
system_info: n_threads = 4 / 24 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | OPENMP = 1 | REPACK = 1 |
main: processing './whisper.cpp/samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
whisper_print_timings: load time = 574.14 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 25.82 ms
whisper_print_timings: sample time = 199.47 ms / 133 runs ( 1.50 ms per run)
whisper_print_timings: encode time = 17664.15 ms / 1 runs ( 17664.15 ms per run)
whisper_print_timings: decode time = 113.98 ms / 3 runs ( 37.99 ms per run)
whisper_print_timings: batchd time = 4752.83 ms / 126 runs ( 37.72 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 23411.94 ms
总结
一开始在Windows下没有安装成功
是很久以后,因为DuMate等AI工具小助手的出现,帮我解决了Whisper在使用老版本cpu的Windows10 的编译安装问题。
调试
下载的模型的时候,显示BITS Transfer
下载指定模型,存储在 models 文件夹下,使用命令:
.\models\download-ggml-model.cmd small
结果显示:
BITS Transfer 这是一个使用后台智能传输服务(BITS)的文件传输。 [ ] Connecting
最后显示:
Failed to download ggml model small
Please try again later or download the original Whisper model files and convert them yourself.
看来还是huggingface服务器连不上的问题,使用镜像:
https://hf-mirror.com
修改
.\models\download-ggml-model.cmd 文件,将其中的https://huggingface.co 修改成:https://hf-mirror.com
好了,可以下载了,就是速度有点慢...
安装chocolatey提示报错
报错信息:
@powershell -NoProfile -ExecutionPolicy Bypass -Command "iex ((new-object net.webclient).DownloadString('https://chocolatey.org/install.ps1'))" && SET PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin
警告: An existing Chocolatey installation was detected. Installation will not continue. This script will not overwrite existing
installations.
If there is no Chocolatey installation at 'C:\ProgramData\chocolatey', delete the folder and attempt the installation again.
Please use choco upgrade chocolatey to handle upgrades of Chocolatey itself.
If the existing installation is not functional or a prior installation did not complete, follow these steps:
-
Backup the files at the path listed above so you can restore your previous installation if needed.
-
Remove the existing installation manually.
-
Rerun this installation script.
-
Reinstall any packages previously installed, if needed (refer to the lib folder in the backup).
Once installation is completed, the backup folder is no longer needed and can be deleted.
按照提示,删除了 C:\ProgramData\chocolatey 这个目录,然后重新安装,问题解决
编译完成后执行报错,说要用whisper-cli.exe,但是没有这个文件
E:\github\whisper.cpp>build\bin\Release\main.exe -f samples\jfk.wav
WARNING: The binary 'main.exe' is deprecated.
Please use 'whisper-cli.exe' instead.
See https://github.com/ggerganov/whisper.cpp/tree/master/examples/deprecation-warning/README.md for more information.
编译好的文件里没有这个文件:
E:\github\whisper.cpp\build\bin\Release 的目录
2025/02/05 22:36 <DIR> .
2025/02/05 22:36 <DIR> ..
2025/02/05 22:36 16,896 bench.exe
2025/02/05 22:36 16,896 command.exe
2025/02/05 22:36 479,232 ggml-base.dll
2025/02/05 22:36 319,488 ggml-cpu.dll
2025/02/05 22:36 64,512 ggml.dll
2025/02/05 22:36 16,896 main.exe
2025/02/05 22:36 16,896 stream.exe
原来是:
make whisper的时候报错:
E:\github\whisper.cpp\src\whisper.cpp(4851,25): error C3688: 文本后缀"鈾"无效;未找到文文本运算符或文本运算符模板"operator """"鈾" [E:\github\whi
sper.cpp\build\src\whisper.vcxproj]
E:\github\whisper.cpp\src\whisper.cpp(4851,39): error C3688: 文本后缀"鈾"无效;未找到文文本运算符或文本运算符模板"operator """"鈾" [E:\github\whi
sper.cpp\build\src\whisper.vcxproj]
E:\github\whisper.cpp\src\whisper.cpp(4851,53): error C3688: 文本后缀"鈾"无效;未找到文文本运算符或文本运算符模板"operator """"鈾" [E:\github\whi
sper.cpp\build\src\whisper.vcxproj]
E:\github\whisper.cpp\src\whisper.cpp(4852,1): error C3688: 文本后缀"鈾"无效;未找到文文本运算符或文本运算符模板"operator """"鈾" [E:\github\whis
per.cpp\build\src\whisper.vcxproj]
4851这一行:
"♪♪♪","♩", "♪", "♫", "♬", "♭", "♮", "♯"
4852这一行:
};
不明白该怎么办,文心一言说:
-
将文件保存为Unicode格式:
- 打开
whisper.cpp文件,使用支持Unicode编码的文本编辑器(如Notepad++、Visual Studio Code等)。 - 将文件另存为,选择编码格式为
UTF-8(无BOM)或UTF-16(根据您的编译器支持情况选择)。 - 保存文件并关闭编辑器。
- 打开
-
修改项目设置:
- 在您的Visual Studio项目中,右键点击
whisper.cpp文件,选择"属性"。 - 在"配置属性"->"常规"->"字符集"中,选择"使用Unicode字符集"或"使用多字节字符集(与项目编码一致)"。
- 如果您选择了
UTF-8编码,请确保项目中所有相关文件都使用相同的编码,并且编译器设置也支持UTF-8。
- 在您的Visual Studio项目中,右键点击
将文件用notepad转存为utf-16LE,再编译,通过!
生成了whisper-cli文件,在build\bin\Release 目录里。
测试whisper-cli文件报错:
whisper-cli文件,测试
E:\github\whisper.cpp>build\bin\Release\whisper-cli -f samples\jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_init_from_file_with_params_no_state: failed to open 'models/ggml-base.en.bin'
error: failed to initialize whisper context
应该是模型还没下载完成
重新下载模型:
models\download-ggml-model.cmd base.en