Windows下安装编译安装Whisper-CPP：一个语音实现框架集和高性能推理模型

先上结论，刚开始没找到windows下怎么编译，所以不会装。

后来让DuMate帮着编译安装的。

官网：github.com

学习参考这篇文档：Whisper.cpp 编译使用_whisper c++-CSDN博客

High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model:

Plain C/C++ implementation without dependencies
Apple Silicon first-class citizen - optimized via ARM NEON, Accelerate framework, Metal and Core ML
AVX intrinsics support for x86 architectures
VSX intrinsics support for POWER architectures
Mixed F16 / F32 precision
Integer quantization support
Zero memory allocations at runtime
Vulkan support
Support for CPU-only inference
Efficient GPU support for NVIDIA
OpenVINO Support
Ascend NPU Support
C-style API

Supported platforms:

Mac OS (Intel and Arm)
iOS
Android
Java
Linux / FreeBSD
WebAssembly
Windows (MSVC and MinGW]
Raspberry Pi
Docker

支持Windows

实践

下载源代码

复制代码

git clone https://github.com/ggerganov/whisper.cpp

进入目录

复制代码

cd whisper.cpp

下载模型：

下载下载指定模型，下载好之后，它们存储在 models 文件夹下。

复制代码

.\models\download-ggml-model.cmd small

下载模型

复制代码

 models\download-ggml-model.cmd base.en

为了编译

如果是Linux下，直接cmake即可，但是Windows下，就有些麻烦了：

先安装choco ，注意，需要使用admin权限

复制代码

@powershell -NoProfile -ExecutionPolicy Bypass -Command "iex ((new-object net.webclient).DownloadString('https://chocolatey.org/install.ps1'))" && SET PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin

然后就可以使用choco 来安装各种软件，比如make

复制代码

choco install make

编译

复制代码

cmake -B build
cmake --build build --config Release

复制代码

cmake -B build
cmake --build build --config Release

好吧，windows下没找到怎么编译...

编译完成后，有如下文件：

复制代码

 E:\github\whisper.cpp\build\bin\Release 的目录

2025/02/05  22:36    <DIR>          .
2025/02/05  22:36    <DIR>          ..
2025/02/05  22:36            16,896 bench.exe
2025/02/05  22:36            16,896 command.exe
2025/02/05  22:36           479,232 ggml-base.dll
2025/02/05  22:36           319,488 ggml-cpu.dll
2025/02/05  22:36            64,512 ggml.dll
2025/02/05  22:36            16,896 main.exe
2025/02/05  22:36            16,896 stream.exe

再次编译，有了

whisper-cli文件，测试

复制代码

E:\github\whisper.cpp>build\bin\Release\whisper-cli -f samples\jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_init_from_file_with_params_no_state: failed to open 'models/ggml-base.en.bin'
error: failed to initialize whisper context

Docker

Prerequisites

Docker must be installed and running on your system.
Create a folder to store big models & intermediate files (ex. /whisper/models)

Images

We have two Docker images available for this project:

ghcr.io/ggerganov/whisper.cpp:main: This image includes the main executable file as well as curl and ffmpeg. (platforms: linux/amd64, linux/arm64)
ghcr.io/ggerganov/whisper.cpp:main-cuda: Same as main but compiled with CUDA support. (platforms: linux/amd64)

Usage

复制代码

# download model and persist it in a local folder
docker run -it --rm \
  -v path/to/models:/models \
  whisper.cpp:main "./models/download-ggml-model.sh base /models"
# transcribe an audio file
docker run -it --rm \
  -v path/to/models:/models \
  -v path/to/audios:/audios \
  whisper.cpp:main "./main -m /models/ggml-base.bin -f /audios/jfk.wav"
# transcribe an audio file in samples folder
docker run -it --rm \
  -v path/to/models:/models \
  whisper.cpp:main "./main -m /models/ggml-base.bin -f ./samples/jfk.w

尝试Windows下编译

自己的机器cpu比较老，Intel Xeon E5-2643 v2，是 Ivy Bridge 架构的处理器。这个处理器支持 AVX 但不支持 AVX2。官方预编译的二进制文件可能使用了 AVX2 指令，导致不能使用，所以需要自己编译。

刚开始不会编译，后来是让DuMate帮着完成的。

创建bash脚本编译

复制代码

cd whisper.cpp && "C:\Program Files\CMake\bin\cmake.exe" -B build -G "Visual Studio 17 2022" -A x64 -DGGML_AVX2=OFF

编译完成

测试Whisper

注意默认用的是英文模型，所以只能识别英文语音：

复制代码

C:\Users\Admin\.qianfan\workspace\4b0c1bb6fb7845d9b44d7b5bf76070e8>whisper.cpp\build\bin\Release\whisper-cli.exe -m whisper-install\models\ggml-base.en.bin -f E:\360Downloads\chinese.wav

输出

复制代码

C:\Users\Admin\.qianfan\workspace\4b0c1bb6fb7845d9b44d7b5bf76070e8>./whisper.cpp/build/bin/Release/whisper-cli.exe -m ./whisper-install/models/ggml-base.en.bin -f ./whisper.cpp/samples/jfk.wav
'.' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\Admin\.qianfan\workspace\4b0c1bb6fb7845d9b44d7b5bf76070e8>whisper.cpp\build\bin\Release\whisper-cli.exe -m whisper-install\models\ggml-base.en.bin -f E:\360Downloads\chinese.wav
whisper_init_from_file_with_params_no_state: loading model from 'whisper-install\models\ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 1
whisper_init_with_params_no_state: backends   = 1
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:          CPU total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init_gpu: device 0: CPU (type: 0)
whisper_backend_init_gpu: no GPU found
whisper_init_state: kv self size  =    6.29 MB
whisper_init_state: kv cross size =   18.87 MB
whisper_init_state: kv pad  size  =    3.15 MB
whisper_init_state: compute buffer (conv)   =   16.28 MB
whisper_init_state: compute buffer (encode) =   23.09 MB
whisper_init_state: compute buffer (cross)  =    4.66 MB
whisper_init_state: compute buffer (decode) =   96.37 MB

system_info: n_threads = 4 / 24 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | OPENMP = 1 | REPACK = 1 |

main: processing 'E:\360Downloads\chinese.wav' (180480 samples, 11.3 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:03.660]   (speaking in foreign language)

whisper_print_timings:     load time =   579.79 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    17.93 ms
whisper_print_timings:   sample time =    88.84 ms /    53 runs (     1.68 ms per run)
whisper_print_timings:   encode time = 17883.79 ms /     1 runs ( 17883.79 ms per run)
whisper_print_timings:   decode time =   212.33 ms /     5 runs (    42.47 ms per run)
whisper_print_timings:   batchd time =  1707.05 ms /    44 runs (    38.80 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (     0.00 ms per run)
whisper_print_timings:    total time = 20612.66 ms

上面音频文件是中文，它不识别，换成英文就好了

复制代码

C:\Users\Admin\.qianfan\workspace\4b0c1bb6fb7845d9b44d7b5bf76070e8>whisper.cpp\build\bin\Release\whisper-cli.exe -m whisper-install\models\ggml-base.en.bin -f ./whisper.cpp/samples/jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'whisper-install\models\ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 1
whisper_init_with_params_no_state: backends   = 1
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:          CPU total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init_gpu: device 0: CPU (type: 0)
whisper_backend_init_gpu: no GPU found
whisper_init_state: kv self size  =    6.29 MB
whisper_init_state: kv cross size =   18.87 MB
whisper_init_state: kv pad  size  =    3.15 MB
whisper_init_state: compute buffer (conv)   =   16.28 MB
whisper_init_state: compute buffer (encode) =   23.09 MB
whisper_init_state: compute buffer (cross)  =    4.66 MB
whisper_init_state: compute buffer (decode) =   96.37 MB

system_info: n_threads = 4 / 24 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | OPENMP = 1 | REPACK = 1 |

main: processing './whisper.cpp/samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings:     load time =   574.14 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    25.82 ms
whisper_print_timings:   sample time =   199.47 ms /   133 runs (     1.50 ms per run)
whisper_print_timings:   encode time = 17664.15 ms /     1 runs ( 17664.15 ms per run)
whisper_print_timings:   decode time =   113.98 ms /     3 runs (    37.99 ms per run)
whisper_print_timings:   batchd time =  4752.83 ms /   126 runs (    37.72 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (     0.00 ms per run)
whisper_print_timings:    total time = 23411.94 ms

总结

一开始在Windows下没有安装成功

是很久以后，因为DuMate等AI工具小助手的出现，帮我解决了Whisper在使用老版本cpu的Windows10 的编译安装问题。

调试

下载的模型的时候，显示BITS Transfer

下载指定模型，存储在 models 文件夹下，使用命令：

.\models\download-ggml-model.cmd small

结果显示：

BITS Transfer 这是一个使用后台智能传输服务(BITS)的文件传输。 Connecting

最后显示：

Failed to download ggml model small

Please try again later or download the original Whisper model files and convert them yourself.

看来还是huggingface服务器连不上的问题，使用镜像：

复制代码

https://hf-mirror.com

修改

.\models\download-ggml-model.cmd 文件，将其中的https://huggingface.co 修改成：https://hf-mirror.com

好了，可以下载了，就是速度有点慢...

安装chocolatey提示报错

报错信息：

@powershell -NoProfile -ExecutionPolicy Bypass -Command "iex ((new-object net.webclient).DownloadString('https://chocolatey.org/install.ps1'))" && SET PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin

警告: An existing Chocolatey installation was detected. Installation will not continue. This script will not overwrite existing

installations.

If there is no Chocolatey installation at 'C:\ProgramData\chocolatey', delete the folder and attempt the installation again.

Please use choco upgrade chocolatey to handle upgrades of Chocolatey itself.

If the existing installation is not functional or a prior installation did not complete, follow these steps:

Backup the files at the path listed above so you can restore your previous installation if needed.
Remove the existing installation manually.
Rerun this installation script.
Reinstall any packages previously installed, if needed (refer to the lib folder in the backup).

Once installation is completed, the backup folder is no longer needed and can be deleted.

按照提示，删除了 C:\ProgramData\chocolatey 这个目录，然后重新安装，问题解决

编译完成后执行报错，说要用whisper-cli.exe，但是没有这个文件

复制代码

E:\github\whisper.cpp>build\bin\Release\main.exe -f samples\jfk.wav

WARNING: The binary 'main.exe' is deprecated.
 Please use 'whisper-cli.exe' instead.
 See https://github.com/ggerganov/whisper.cpp/tree/master/examples/deprecation-warning/README.md for more information.

编译好的文件里没有这个文件：

复制代码

 E:\github\whisper.cpp\build\bin\Release 的目录

2025/02/05  22:36    <DIR>          .
2025/02/05  22:36    <DIR>          ..
2025/02/05  22:36            16,896 bench.exe
2025/02/05  22:36            16,896 command.exe
2025/02/05  22:36           479,232 ggml-base.dll
2025/02/05  22:36           319,488 ggml-cpu.dll
2025/02/05  22:36            64,512 ggml.dll
2025/02/05  22:36            16,896 main.exe
2025/02/05  22:36            16,896 stream.exe

原来是：

make whisper的时候报错：

E:\github\whisper.cpp\src\whisper.cpp(4851,25): error C3688: 文本后缀"鈾"无效；未找到文文本运算符或文本运算符模板"operator """"鈾" [E:\github\whi

sper.cpp\build\src\whisper.vcxproj]

E:\github\whisper.cpp\src\whisper.cpp(4851,39): error C3688: 文本后缀"鈾"无效；未找到文文本运算符或文本运算符模板"operator """"鈾" [E:\github\whi

sper.cpp\build\src\whisper.vcxproj]

E:\github\whisper.cpp\src\whisper.cpp(4851,53): error C3688: 文本后缀"鈾"无效；未找到文文本运算符或文本运算符模板"operator """"鈾" [E:\github\whi

sper.cpp\build\src\whisper.vcxproj]

E:\github\whisper.cpp\src\whisper.cpp(4852,1): error C3688: 文本后缀"鈾"无效；未找到文文本运算符或文本运算符模板"operator """"鈾" [E:\github\whis

per.cpp\build\src\whisper.vcxproj]

4851这一行：

复制代码

    "♪♪♪","♩", "♪", "♫", "♬", "♭", "♮", "♯"

4852这一行：

复制代码

};

不明白该怎么办，文心一言说：

‌将文件保存为Unicode格式‌：
- 打开whisper.cpp文件，使用支持Unicode编码的文本编辑器（如Notepad++、Visual Studio Code等）。
- 将文件另存为，选择编码格式为UTF-8（无BOM）或UTF-16（根据您的编译器支持情况选择）。
- 保存文件并关闭编辑器。
‌修改项目设置‌：
- 在您的Visual Studio项目中，右键点击whisper.cpp文件，选择"属性"。
- 在"配置属性"->"常规"->"字符集"中，选择"使用Unicode字符集"或"使用多字节字符集（与项目编码一致）"。
- 如果您选择了UTF-8编码，请确保项目中所有相关文件都使用相同的编码，并且编译器设置也支持UTF-8。

将文件用notepad转存为utf-16LE，再编译，通过！

生成了whisper-cli文件，在build\bin\Release 目录里。

测试whisper-cli文件报错：

whisper-cli文件，测试

复制代码

E:\github\whisper.cpp>build\bin\Release\whisper-cli -f samples\jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_init_from_file_with_params_no_state: failed to open 'models/ggml-base.en.bin'
error: failed to initialize whisper context

应该是模型还没下载完成

重新下载模型：

models\download-ggml-model.cmd base.en