20240202在WIN10下使用whisper.cpp

20240202在WIN10下使用whisper.cpp

2024/2/2 14:15

【结论:在Windows10下,确认large模式识别7分钟中文视频,需要83.7284 seconds,需要大概1.5分钟!效率太差!】
83.7284/420=0.19935333333333333333333333333333

前提条件,可以通过技术手段上外网!^_

首先你要有一张NVIDIA的显卡,比如我用的PDD拼多多的二手GTX1080显卡。【并且极其可能是矿卡!】800¥

2、请正确安装好NVIDIA最新的545版本的驱动程序和CUDA、cuDNN。

2、安装Torch

3、配置whisper

识别得到的字幕chs.srt是繁体中文的,将来要想办法更换为简体中文的!
1
00:00:00,000 --> 00:00:01,400
前段時間有個巨石恆虎

2
00:00:01,400 --> 00:00:03,000
某某是男人最好的醫妹

3
00:00:03,000 --> 00:00:04,800
這裡的某某可以替換為減肥

4
00:00:04,800 --> 00:00:07,800
長髮 西裝 考研 速唱 永潔無間等等等等

https://github.com/Const-me/Whisper/releases

https://www.cnblogs.com/jike9527/p/17545484.html?share_token=5af4092d-5b67-4e52-8231-0ae220fd2185

https://www.cnblogs.com/jike9527/p/17545484.html

使用whisper批量生成字幕(whisper.cpp)

c:\>
c:\>git clone https://github.com/ggerganov/whisper.cpp
Cloning into 'whisper.cpp'...
remote: Enumerating objects: 6773, done.
remote: Counting objects: 100% (1995/1995), done.
remote: Compressing objects: 100% (275/275), done.
remote: Total 6773 (delta 1826), reused 1810 (delta 1714), pack-reused 4778
Receiving objects: 100% (6773/6773), 10.18 MiB | 6.55 MiB/s, done.
Resolving deltas: 100% (4368/4368), done.

c:\>cd whisper.cpp

c:\whisper.cpp>dir

驱动器 C 中的卷是 WIN10

卷的序列号是 9273-D6A8

c:\whisper.cpp 的目录

2024/02/02 14:20 <DIR> .

2024/02/02 14:20 <DIR> ..

2024/02/02 14:20 <DIR> .devops

2024/02/02 14:20 <DIR> .github

2024/02/02 14:20 863 .gitignore

2024/02/02 14:20 99 .gitmodules

2024/02/02 14:20 <DIR> bindings

2024/02/02 14:20 <DIR> cmake

2024/02/02 14:20 19,729 CMakeLists.txt

2024/02/02 14:20 <DIR> coreml

2024/02/02 14:20 <DIR> examples

2024/02/02 14:20 <DIR> extra

2024/02/02 14:20 32,539 ggml-alloc.c

2024/02/02 14:20 4,149 ggml-alloc.h

2024/02/02 14:20 5,996 ggml-backend-impl.h

2024/02/02 14:20 69,048 ggml-backend.c

2024/02/02 14:20 11,932 ggml-backend.h

2024/02/02 14:20 451,408 ggml-cuda.cu

2024/02/02 14:20 2,156 ggml-cuda.h

2024/02/02 14:20 7,813 ggml-impl.h

2024/02/02 14:20 2,425 ggml-metal.h

2024/02/02 14:20 152,813 ggml-metal.m

2024/02/02 14:20 231,753 ggml-metal.metal

2024/02/02 14:20 87,989 ggml-opencl.cpp

2024/02/02 14:20 1,422 ggml-opencl.h

2024/02/02 14:20 411,673 ggml-quants.c

2024/02/02 14:20 13,983 ggml-quants.h

2024/02/02 14:20 696,627 ggml.c

2024/02/02 14:20 87,399 ggml.h

2024/02/02 14:20 <DIR> grammars

2024/02/02 14:20 1,093 LICENSE

2024/02/02 14:20 15,341 Makefile

2024/02/02 14:20 <DIR> models

2024/02/02 14:20 <DIR> openvino

2024/02/02 14:20 1,835 Package.swift

2024/02/02 14:20 39,942 README.md

2024/02/02 14:20 <DIR> samples

2024/02/02 14:20 <DIR> spm-headers

2024/02/02 14:20 <DIR> tests

2024/02/02 14:20 239,648 whisper.cpp

2024/02/02 14:20 30,873 whisper.h

26 个文件 2,620,548 字节

15 个目录 128,119,971,840 可用字节

c:\whisper.cpp>

c:\whisper.cpp>

c:\whisper.cpp>

c:\whisper.cpp>cd models

c:\whisper.cpp\models>dir

驱动器 C 中的卷是 WIN10

卷的序列号是 9273-D6A8

c:\whisper.cpp\models 的目录

2024/02/02 14:20 <DIR> .

2024/02/02 14:20 <DIR> ..

2024/02/02 14:20 7 .gitignore

2024/02/02 14:20 4,980 convert-h5-to-coreml.py

2024/02/02 14:20 7,584 convert-h5-to-ggml.py

2024/02/02 14:20 10,955 convert-pt-to-ggml.py

2024/02/02 14:20 12,761 convert-whisper-to-coreml.py

2024/02/02 14:20 1,799 convert-whisper-to-openvino.py

2024/02/02 14:20 2,272 download-coreml-model.sh

2024/02/02 14:20 1,440 download-ggml-model.cmd

2024/02/02 14:20 3,039 download-ggml-model.sh

2024/02/02 14:20 575,451 for-tests-ggml-base.bin

2024/02/02 14:20 586,836 for-tests-ggml-base.en.bin

2024/02/02 14:20 575,451 for-tests-ggml-large.bin

2024/02/02 14:20 575,451 for-tests-ggml-medium.bin

2024/02/02 14:20 586,836 for-tests-ggml-medium.en.bin

2024/02/02 14:20 575,451 for-tests-ggml-small.bin

2024/02/02 14:20 586,836 for-tests-ggml-small.en.bin

2024/02/02 14:20 575,451 for-tests-ggml-tiny.bin

2024/02/02 14:20 586,836 for-tests-ggml-tiny.en.bin

2024/02/02 14:20 1,506 generate-coreml-interface.sh

2024/02/02 14:20 1,355 generate-coreml-model.sh

2024/02/02 14:20 3,711 ggml_to_pt.py

2024/02/02 14:20 42 openvino-conversion-requirements.txt

2024/02/02 14:20 5,615 README.md

23 个文件 5,281,665 字节

2 个目录 105,396,047,872 可用字节

c:\whisper.cpp\models>main.exe -f samples\jfk.wav

'main.exe' 不是内部或外部命令,也不是可运行的程序

或批处理文件。

c:\whisper.cpp\models>dir

驱动器 C 中的卷是 WIN10

卷的序列号是 9273-D6A8

c:\whisper.cpp\models 的目录

2024/02/02 14:23 <DIR> .

2024/02/02 14:23 <DIR> ..

2024/02/02 14:20 7 .gitignore

2024/02/02 14:20 4,980 convert-h5-to-coreml.py

2024/02/02 14:20 7,584 convert-h5-to-ggml.py

2024/02/02 14:20 10,955 convert-pt-to-ggml.py

2024/02/02 14:20 12,761 convert-whisper-to-coreml.py

2024/02/02 14:20 1,799 convert-whisper-to-openvino.py

2024/02/02 14:20 2,272 download-coreml-model.sh

2024/02/02 14:20 1,440 download-ggml-model.cmd

2024/02/02 14:20 3,039 download-ggml-model.sh

2024/02/02 14:20 575,451 for-tests-ggml-base.bin

2024/02/02 14:20 586,836 for-tests-ggml-base.en.bin

2024/02/02 14:20 575,451 for-tests-ggml-large.bin

2024/02/02 14:20 575,451 for-tests-ggml-medium.bin

2024/02/02 14:20 586,836 for-tests-ggml-medium.en.bin

2024/02/02 14:20 575,451 for-tests-ggml-small.bin

2024/02/02 14:20 586,836 for-tests-ggml-small.en.bin

2024/02/02 14:20 575,451 for-tests-ggml-tiny.bin

2024/02/02 14:20 586,836 for-tests-ggml-tiny.en.bin

2024/02/02 14:20 1,506 generate-coreml-interface.sh

2024/02/02 14:20 1,355 generate-coreml-model.sh

2024/02/02 13:23 37,922,638 ggml-base-encoder.mlmodelc.zip

2024/02/02 13:23 59,707,625 ggml-base-q5_1.bin

2024/02/02 13:24 147,951,465 ggml-base.bin

2024/02/02 13:24 37,950,917 ggml-base.en-encoder.mlmodelc.zip

2024/02/02 13:24 59,721,011 ggml-base.en-q5_1.bin

2024/02/02 13:24 147,964,211 ggml-base.en.bin

2024/02/02 13:30 1,177,529,527 ggml-large-v1-encoder.mlmodelc.zip

2024/02/02 13:35 3,094,623,691 ggml-large-v1.bin

2024/02/02 13:31 1,174,643,458 ggml-large-v2-encoder.mlmodelc.zip

2024/02/02 13:30 1,080,732,091 ggml-large-v2-q5_0.bin

2024/02/02 13:35 3,094,623,691 ggml-large-v2.bin

2024/02/02 13:31 1,175,711,232 ggml-large-v3-encoder.mlmodelc.zip

2024/02/02 13:32 1,081,140,203 ggml-large-v3-q5_0.bin

2024/02/02 13:35 3,095,033,483 ggml-large-v3.bin

2024/02/02 13:57 567,829,413 ggml-medium-encoder.mlmodelc.zip

2024/02/02 13:57 539,212,467 ggml-medium-q5_0.bin

2024/02/02 14:03 1,533,763,059 ggml-medium.bin

2024/02/02 13:59 566,993,085 ggml-medium.en-encoder.mlmodelc.zip

2024/02/02 13:59 539,225,533 ggml-medium.en-q5_0.bin

2024/02/02 14:04 1,533,774,781 ggml-medium.en.bin

2024/02/02 14:08 163,083,239 ggml-small-encoder.mlmodelc.zip

2024/02/02 14:07 190,085,487 ggml-small-q5_1.bin

2024/02/02 14:09 487,601,967 ggml-small.bin

2024/02/02 14:09 162,952,446 ggml-small.en-encoder.mlmodelc.zip

2024/02/02 14:09 190,098,681 ggml-small.en-q5_1.bin

2024/02/02 14:11 487,614,201 ggml-small.en.bin

2024/02/02 14:10 15,037,446 ggml-tiny-encoder.mlmodelc.zip

2024/02/02 14:10 32,152,673 ggml-tiny-q5_1.bin

2024/02/02 14:11 77,691,713 ggml-tiny.bin

2024/02/02 14:11 15,034,655 ggml-tiny.en-encoder.mlmodelc.zip

2024/02/02 14:11 32,166,155 ggml-tiny.en-q5_1.bin

2024/02/02 14:12 43,550,795 ggml-tiny.en-q8_0.bin

2024/02/02 14:12 77,704,715 ggml-tiny.en.bin

2024/02/02 14:20 3,711 ggml_to_pt.py

2024/02/02 13:23 1,477 gitattributes

2024/02/02 14:20 42 openvino-conversion-requirements.txt

2024/02/02 13:23 1,311 README.md

57 个文件 22,726,106,592 字节

2 个目录 105,396,191,232 可用字节

c:\whisper.cpp\models>cd ..

c:\whisper.cpp>dir

c:\whisper.cpp>

c:\whisper.cpp>
c:\whisper.cpp>main.exe -f samples\jfk.wav
Using GPU "NVIDIA GeForce GTX 1080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 62.8 kb RAM
Loaded vocabulary, 51864 strings, 3050.6 kb RAM
Loaded 245 GPU tensors, 140.539 MB VRAM
Computed CPU base frequency: 2.29469 GHz
Loaded model from "models/ggml-base.en.bin" to VRAM
Created source reader from the file "samples\jfk.wav"

[00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

CPU Tasks

LoadModel 577.635 milliseconds

RunComplete 422.9 milliseconds

Run 319.505 milliseconds

Callbacks 5.4751 milliseconds, 2 calls, 2.73755 milliseconds average

Spectrogram 52.7935 milliseconds, 3 calls, 17.5978 milliseconds average

Sample 7.6473 milliseconds, 27 calls, 283.233 microseconds average

Encode 188.011 milliseconds

Decode 125.975 milliseconds

DecodeStep 118.306 milliseconds, 27 calls, 4.38169 milliseconds average

GPU Tasks

LoadModel 249.459 milliseconds

Run 231.117 milliseconds

Encode 99.0044 milliseconds

EncodeLayer 77.7554 milliseconds, 6 calls, 12.9592 milliseconds average

Decode 132.112 milliseconds

DecodeStep 132.103 milliseconds, 27 calls, 4.89271 milliseconds average

DecodeLayer 87.4824 milliseconds, 162 calls, 540.015 microseconds average

Compute Shaders

mulMatTiled 63.4898 milliseconds, 60 calls, 1.05816 milliseconds average

mulMatByRowTiled 50.9198 milliseconds, 1959 calls, 25.9928 microseconds average

softMaxLong 27.5314 milliseconds, 27 calls, 1.01968 milliseconds average

norm 12.3785 milliseconds, 526 calls, 23.5333 microseconds average

addRepeatGelu 11.9749 milliseconds, 170 calls, 70.4406 microseconds average

fmaRepeat1 7.652 milliseconds, 526 calls, 14.5475 microseconds average

addRepeatEx 7.4319 milliseconds, 498 calls, 14.9235 microseconds average

softMaxFixed 6.913 milliseconds, 168 calls, 41.1488 microseconds average

copyConvert 5.397 milliseconds, 348 calls, 15.5086 microseconds average

convolutionMain 5.3903 milliseconds

convolutionMain2Fixed 5.2572 milliseconds

copyTranspose 4.6246 milliseconds, 336 calls, 13.7637 microseconds average

scaleInPlace 4.5107 milliseconds, 168 calls, 26.8494 microseconds average

addRepeatScale 3.7607 milliseconds, 324 calls, 11.6071 microseconds average

softMax 2.9733 milliseconds, 162 calls, 18.3537 microseconds average

addRepeat 1.8574 milliseconds, 180 calls, 10.3189 microseconds average

diagMaskInf 1.3711 milliseconds, 162 calls, 8.46358 microseconds average

convolutionPrep1 439.3 microseconds, 2 calls, 219.65 microseconds average

convolutionPrep2 229.4 microseconds, 2 calls, 114.7 microseconds average

addRows 191.5 microseconds, 27 calls, 7.09259 microseconds average

add 60.4 microseconds

mulMatByScalar 29.7 microseconds, 6 calls, 4.95 microseconds average

mulMatByRow 27.6 microseconds, 6 calls, 4.6 microseconds average

Memory Usage

Model 858.5 KB RAM, 140.539 MB VRAM

Context 1.19063 MB RAM, 186.732 MB VRAM

Total 2.02901 MB RAM, 327.271 MB VRAM

c:\whisper.cpp>main.exe -l zh -osrt -m models/ggml-medium.bin chs.wav
Using GPU "NVIDIA GeForce GTX 1080", feature level 12.1, effective flags Wave32 | NoReshapedMatMul
Loaded MEL filters, 62.8 kb RAM
Loaded vocabulary, 51865 strings, 3037.1 kb RAM
Loaded 947 GPU tensors, 1462.12 MB VRAM
Computed CPU base frequency: 2.29469 GHz
Loaded model from "models/ggml-medium.bin" to VRAM
Created source reader from the file "chs.wav"

[00:00:00.000 --> 00:00:01.400] ?????????????

[00:00:01.400 --> 00:00:03.000] ????????????

[00:00:03.000 --> 00:00:04.800] ?????????????????

[00:00:04.800 --> 00:00:07.800] ??? ?? ??? ?? ?????????

[00:00:07.800 --> 00:00:09.200] ???????????

[00:00:09.200 --> 00:00:12.000] ??????????????????????

[00:00:12.000 --> 00:00:13.400] ?????????

[00:00:13.400 --> 00:00:14.400] ???????

[00:00:14.400 --> 00:00:17.400] ?????????????????????????

[00:00:17.400 --> 00:00:20.000] ?????????????????????

[00:00:20.000 --> 00:00:21.600] ???????????????

[00:00:21.600 --> 00:00:22.800] ?????????

[00:00:22.800 --> 00:00:24.400] ?????????????

[00:00:24.400 --> 00:00:29.600] ?????????????????? ?????????????????????

[00:00:29.600 --> 00:00:32.400] ??????? ???????? ???

[00:00:32.400 --> 00:00:34.600] ??????????????????

[00:00:34.600 --> 00:00:36.200] ???????????

[00:00:36.200 --> 00:00:37.000] ???

[00:00:37.000 --> 00:00:38.000] ?????

[00:00:38.000 --> 00:00:39.400] ???????????

[00:00:39.400 --> 00:00:40.600] ????????

[00:00:40.600 --> 00:00:41.800] ????? ?????

[00:00:41.800 --> 00:00:44.000] ???????????????????

[00:00:44.000 --> 00:00:46.600] ?????????????????????????

[00:00:46.600 --> 00:00:49.600] ???????????????????????

[00:00:49.600 --> 00:00:52.000] ???????????????????

[00:00:52.000 --> 00:00:54.200] ???????????????????

[00:00:54.200 --> 00:00:56.000] ??????? ??????

[00:00:56.000 --> 00:00:58.000] ???????????????????

[00:00:58.000 --> 00:01:00.000] ??????????????

[00:01:00.000 --> 00:01:01.000] ????????

[00:01:01.000 --> 00:01:02.600] ???????????

[00:01:02.600 --> 00:01:04.800] ????????????? ????????

[00:01:04.800 --> 00:01:07.000] ??11 ??????????????????

[00:01:07.000 --> 00:01:10.000] ?????????????????? ????????

[00:01:10.000 --> 00:01:13.200] ???? ??????????????????296%

[00:01:13.200 --> 00:01:16.000] ?????????????????????

[00:01:16.000 --> 00:01:20.000] ??????11 ?????? ????????????7????????

[00:01:20.000 --> 00:01:21.000] ?????????

[00:01:21.000 --> 00:01:22.400] ???????????

[00:01:22.400 --> 00:01:24.200] ???? ????????

[00:01:24.200 --> 00:01:26.800] ???????????????????????

[00:01:26.800 --> 00:01:28.400] ???? ?????????

[00:01:28.400 --> 00:01:29.800] ??????????

[00:01:29.800 --> 00:01:31.800] ?????????????? ????

[00:01:31.800 --> 00:01:33.400] ??????????????

[00:01:33.400 --> 00:01:35.400] ???????????????

[00:01:35.400 --> 00:01:37.600] ??? ?????2198

[00:01:37.600 --> 00:01:40.600] ????????? ??????699

[00:01:40.600 --> 00:01:42.200] ?????? ???????

[00:01:42.200 --> 00:01:45.000] 400?????? ?????????300?

[00:01:45.000 --> 00:01:48.200] ??????? ????????200???????????

[00:01:48.200 --> 00:01:51.600] ????? ????????????Citywalk????

[00:01:51.600 --> 00:01:54.600] ?????? ???????1000????

[00:01:54.600 --> 00:01:58.200] ????????????????????????????

[00:01:58.200 --> 00:02:00.400] ?????????????????

[00:02:00.400 --> 00:02:02.200] ?????????????

[00:02:02.200 --> 00:02:05.000] ???????????????????????

[00:02:05.000 --> 00:02:07.400] ????????? ???????????

[00:02:07.400 --> 00:02:08.600] ????????

[00:02:08.600 --> 00:02:10.000] ??????????

[00:02:10.000 --> 00:02:13.400] ???????????????????????? ????1?1???

[00:02:13.400 --> 00:02:15.800] ??????????????? ?????

[00:02:15.800 --> 00:02:18.200] ?????????? ?????????

[00:02:18.200 --> 00:02:20.600] ???????????? ???????

[00:02:20.600 --> 00:02:22.400] ?????????? ???

[00:02:22.400 --> 00:02:26.400] ????????? ????? ???? ??????????

[00:02:26.400 --> 00:02:29.200] ???????? ???????????????????

[00:02:29.200 --> 00:02:30.800] ????????????

[00:02:30.800 --> 00:02:32.600] ???? ???????

[00:02:32.600 --> 00:02:35.400] ????????? ????????

[00:02:35.400 --> 00:02:38.600] ????????????? ???????????

[00:02:38.600 --> 00:02:41.000] ?????? ???????????

[00:02:41.000 --> 00:02:43.600] ?????????1000? ???????

[00:02:43.600 --> 00:02:46.400] 500???????? 200???????

[00:02:46.400 --> 00:02:48.400] ?99 ??????????

[00:02:48.400 --> 00:02:50.800] ???????????? ?????????

[00:02:50.800 --> 00:02:53.800] ???????GORTEX??????? ??3000??

[00:02:53.800 --> 00:02:56.200] ???????????????????????

[00:02:56.200 --> 00:03:00.000] ???????????GORTEX???????????4500

[00:03:00.000 --> 00:03:03.000] ?????GORTEX ?????????????

[00:03:03.000 --> 00:03:05.800] ????? ???????????????????

[00:03:05.800 --> 00:03:08.000] ???????? ????? ????

[00:03:08.000 --> 00:03:09.800] ?????????????????

[00:03:09.800 --> 00:03:11.800] ????????????????????

[00:03:11.800 --> 00:03:14.200] ???????? ????????????

[00:03:14.200 --> 00:03:17.000] ???????????? ????????

[00:03:17.000 --> 00:03:20.000] ??????????? ??????????

[00:03:20.000 --> 00:03:21.600] ????????????

[00:03:21.600 --> 00:03:23.200] ?????????????

[00:03:23.200 --> 00:03:26.000] ????????????????? ?????????????

[00:03:26.000 --> 00:03:29.000] ??????????? ????????? ?????????

[00:03:29.000 --> 00:03:31.800] ?????????? ??????????????

[00:03:31.800 --> 00:03:35.000] ??????? ????????????????????

[00:03:35.000 --> 00:03:36.800] ????????????

[00:03:36.800 --> 00:03:40.000] ???? ???????????? ???

[00:03:40.000 --> 00:03:42.600] ?????????? ???????????

[00:03:42.600 --> 00:03:46.000] ?????????? ????????????

[00:03:46.000 --> 00:03:49.200] ??????????????? ?????????????

[00:03:49.200 --> 00:03:52.200] ?????????? ??????????

[00:03:52.200 --> 00:03:55.000] ???????????????? ?????

[00:03:55.000 --> 00:03:58.000] ???????????? ?????????????

[00:03:58.000 --> 00:04:01.000] ?????????????????????? ?????

[00:04:01.000 --> 00:04:04.000] ??????????????? ??????

[00:04:04.000 --> 00:04:06.600] ??????? ???????????????

[00:04:06.600 --> 00:04:08.800] ???????????????

[00:04:08.800 --> 00:04:12.000] ?????????????????? ?????????

[00:04:12.000 --> 00:04:13.600] ??????????????

[00:04:13.600 --> 00:04:16.200] ??????????? ??????????

[00:04:16.200 --> 00:04:18.400] ???????? ???????

[00:04:18.400 --> 00:04:21.800] ?? ?????? ??????????????

[00:04:21.800 --> 00:04:25.800] ??????????????? ??????????????????

[00:04:25.800 --> 00:04:29.200] ???????? ????????????????????

[00:04:29.200 --> 00:04:30.800] ?????????????????

[00:04:30.800 --> 00:04:33.400] ?????????? ?????????

[00:04:33.400 --> 00:04:36.200] ??????? ????????????????

[00:04:36.200 --> 00:04:39.400] ???????? ???????????????

[00:04:39.400 --> 00:04:41.200] ??????????????

[00:04:41.200 --> 00:04:43.600] ?????????? ?????????

[00:04:43.600 --> 00:04:45.000] ??????????

[00:04:45.000 --> 00:04:47.600] ????????????????????

[00:04:47.600 --> 00:04:51.600] ????????????? ????????? ???????

[00:04:51.600 --> 00:04:53.200] ???????????

[00:04:53.200 --> 00:04:55.800] ??? ??????????????????????

[00:04:55.800 --> 00:04:57.400] ????????????????

[00:04:57.400 --> 00:04:59.800] ?????????????????????

[00:04:59.800 --> 00:05:03.000] ?????????????? ???????????

[00:05:03.000 --> 00:05:04.800] ?????????????????

[00:05:04.800 --> 00:05:07.200] ???????????? ??????????

[00:05:07.200 --> 00:05:09.400] ???? ??????????????

[00:05:09.400 --> 00:05:11.600] ??????????????????

[00:05:11.600 --> 00:05:14.800] ???????????????? ???????????

[00:05:14.800 --> 00:05:16.400] ???? ??????

[00:05:16.400 --> 00:05:18.800] ????? ??????????????

[00:05:18.800 --> 00:05:20.800] ???????????????

[00:05:20.800 --> 00:05:23.200] ????????? ????????????

[00:05:23.200 --> 00:05:25.600] ????????? ??????????????

[00:05:25.600 --> 00:05:29.800] ?????? ????????????????????881?

[00:05:29.800 --> 00:05:31.800] ??????? ??2000?

[00:05:31.800 --> 00:05:34.600] ?????? ??????????????????

[00:05:34.600 --> 00:05:38.400] ?????????8000????????? 2000???????

[00:05:38.600 --> 00:05:41.200] ????????? ????????????

[00:05:41.200 --> 00:05:43.600] ?????? ??? ????????

[00:05:43.600 --> 00:05:46.600] ??2000??8000????????????????

[00:05:46.600 --> 00:05:49.600] ??????????? ?2018?2021?

[00:05:49.600 --> 00:05:52.200] ?????4???????60%??

[00:05:52.200 --> 00:05:56.000] ??5??? ?????????????20??????60??

[00:05:56.000 --> 00:05:59.200] ?????????? ?????????????????

[00:05:59.200 --> 00:06:02.200] ???????????? ?????????????????

[00:06:02.200 --> 00:06:05.200] ?????????? ???????????????

[00:06:05.200 --> 00:06:09.600] ??? ????????? ????????????????????

[00:06:09.600 --> 00:06:11.400] ????????????

[00:06:11.400 --> 00:06:15.200] ???? ?????????? ????????????????

[00:06:15.200 --> 00:06:17.800] ???? ????????????????

[00:06:17.800 --> 00:06:20.600] ?350?????????????????

[00:06:20.600 --> 00:06:23.000] ??????? ??????????

[00:06:23.000 --> 00:06:25.000] ?????????????????

[00:06:25.000 --> 00:06:27.400] ??? ???????????OK

[00:06:27.400 --> 00:06:29.600] ?????????????????????

[00:06:29.600 --> 00:06:31.800] ???????????????????

[00:06:31.800 --> 00:06:36.600] ???????????????? ???????????????????????

[00:06:36.600 --> 00:06:38.800] ?????????????????

[00:06:38.800 --> 00:06:41.400] ???????????????????

[00:06:41.400 --> 00:06:44.200] ??????????????????????????

[00:06:44.200 --> 00:06:46.800] ????????????????????

[00:06:46.800 --> 00:06:48.800] ????????????????

[00:06:48.800 --> 00:06:51.200] ???????????????????

[00:06:51.200 --> 00:06:53.000] ????????????????

[00:06:53.000 --> 00:06:56.000] ?????????????????????????

[00:06:56.000 --> 00:07:01.600] ????????????IC????? ????? ??????

CPU Tasks

LoadModel 1.43866 seconds

RunComplete 83.7284 seconds

Run 83.6255 seconds

Callbacks 457.784 milliseconds, 187 calls, 2.44804 milliseconds average

Spectrogram 1.21106 seconds, 90 calls, 13.4562 milliseconds average

Sample 1.01043 seconds, 3535 calls, 285.836 microseconds average

Encode 15.2296 seconds, 17 calls, 895.858 milliseconds average

Decode 67.9228 seconds, 17 calls, 3.99546 seconds average

DecodeStep 66.9103 seconds, 3535 calls, 18.928 milliseconds average

GPU Tasks

LoadModel 1.03839 seconds

Run 83.4773 seconds

Encode 15.3219 seconds, 17 calls, 901.288 milliseconds average

EncodeLayer 13.0778 seconds, 408 calls, 32.0533 milliseconds average

Decode 68.1554 seconds, 17 calls, 4.00914 seconds average

DecodeStep 68.1535 seconds, 3535 calls, 19.2796 milliseconds average

DecodeLayer 61.7764 seconds, 84840 calls, 728.152 microseconds average

Compute Shaders

mulMatByRowTiled 38.8209 seconds, 1016702 calls, 38.1831 microseconds average

mulMatTiled 15.8527 seconds, 8993 calls, 1.76278 milliseconds average

fmaRepeat1 3.71454 seconds, 258888 calls, 14.348 microseconds average

addRepeatEx 3.43395 seconds, 255336 calls, 13.4487 microseconds average

normFixed 3.29705 seconds, 258888 calls, 12.7354 microseconds average

softMaxLong 2.62421 seconds, 3535 calls, 742.351 microseconds average

copyConvert 2.6175 seconds, 171312 calls, 15.2791 microseconds average

addRepeatScale 2.43674 seconds, 169680 calls, 14.3608 microseconds average

copyTranspose 2.43484 seconds, 170496 calls, 14.2809 microseconds average

softMaxFixed 1.78188 seconds, 85248 calls, 20.9023 microseconds average

addRepeatGelu 1.39165 seconds, 85282 calls, 16.3182 microseconds average

softMax 1.27396 seconds, 84840 calls, 15.0161 microseconds average

scaleInPlace 1.00817 seconds, 85248 calls, 11.8264 microseconds average

addRepeat 954.089 milliseconds, 86064 calls, 11.0858 microseconds average

diagMaskInf 652.093 milliseconds, 84840 calls, 7.68616 microseconds average

convolutionMain2Fixed 388.382 milliseconds, 17 calls, 22.846 milliseconds average

convolutionMain 163.663 milliseconds, 17 calls, 9.62722 milliseconds average

convolutionPrep1 24.0373 milliseconds, 34 calls, 706.979 microseconds average

addRows 21.3709 milliseconds, 3535 calls, 6.04552 microseconds average

convolutionPrep2 7.0976 milliseconds, 34 calls, 208.753 microseconds average

add 1.8821 milliseconds, 17 calls, 110.712 microseconds average

Memory Usage

Model 877.966 KB RAM, 1.42785 GB VRAM

Context 109.465 MB RAM, 785.219 MB VRAM

Total 110.322 MB RAM, 2.19467 GB VRAM

c:\whisper.cpp>

https://github.com/ggerganov/whisper.cpp/tree/master/models

https://github.com/ggerganov/whisper.cpp

ggerganov/whisper.cpp

https://blog.csdn.net/aiyolo/article/details/129674728?share_token=2c48b804-37f6-43a8-9159-08b28147ad67

Whisper.cpp 编译使用

whisper.cpp 是牛人 ggerganov 对 openai 的 whisper 语音识别模型用 C++ 重新实现的项目,开源在 github 上,具有轻量、性能高,实用性强等特点。这篇文章主要记录在 windows 平台,如何使用该模型在本地端进行语音识别。

whisper.cpp 的开源地址在 ggerganov/whisper.cpp: Port of OpenAI's Whisper model in C/C++ (github.com),首先将项目下载在本地。

git clone https://github.com/ggerganov/whisper.cpp

whisper.cpp 项目里提供了几个现成的模型。建议下载 small 以上的模型,不然识别效果完全无法使用。

https://huggingface.co/ggerganov/whisper.cpp

ggerganov/whisper.cpp

OpenAI's Whisper models converted to ggml format

Available models

Model Disk Mem SHA

tiny 75 MB ~390 MB bd577a113a864445d4c299885e0cb97d4ba92b5f

tiny.en 75 MB ~390 MB c78c86eb1a8faa21b369bcd33207cc90d64ae9df

base 142 MB ~500 MB 465707469ff3a37a2b9b8d8f89f2f99de7299dac

base.en 142 MB ~500 MB 137c40403d78fd54d454da0f9bd998f78703390c

small 466 MB ~1.0 GB 55356645c2b361a969dfd0ef2c5a50d530afd8d5

small.en 466 MB ~1.0 GB db8a495a91d927739e50b3fc1cc4c6b8f6c2d022

medium 1.5 GB ~2.6 GB fd9727b6e1217c2f614f9b698455c4ffd82463b4

medium.en 1.5 GB ~2.6 GB 8c30f0e44ce9560643ebd10bbe50cd20eafd3723

large-v1 2.9 GB ~4.7 GB b1caaf735c4cc1429223d5a74f0f4d0b9b59a299

large-v2 2.9 GB ~4.7 GB 0f4c8e34f21cf1a914c59d8b3ce882345ad349d6

large 2.9 GB ~4.7 GB ad82bf6a9043ceed055076d0fd39f5f186ff8062

note: large corresponds to the latest Large v3 model

For more information, visit:

https://github.com/ggerganov/whisper.cpp/tree/master/models

https://huggingface.co/ggerganov/whisper.cpp/tree/main

参考资料:

https://www.toutiao.com/article/7225218604160418338/?app=news_article\&timestamp=1706803458\&use_new_style=1\&req_id=2024020200041726E9258609E554857D25\&group_id=7225218604160418338\&tt_from=mobile_qq\&utm_source=mobile_qq\&utm_medium=toutiao_android\&utm_campaign=client_share\&share_token=37e094d5-29b8-4d14-87bb-241cdc28b0ea\&source=m_redirect

AI浪潮下的12大开源神器介绍

原创2023-04-23 20:33·IT小熊实验室丶

https://blog.csdn.net/sinat_18131557/article/details/130950719?share_token=25ca6bb5-8450-472c-9228-abc8c6ce74d8

whisper.cpp在Windows VS的编译

sinat_18131557 于 2023-05-30 16:03:53 发布

https://www.toutiao.com/article/7283079784329052726/?app=news_article\&timestamp=1706803297\&use_new_style=1\&req_id=20240202000137411974769524167990E0\&group_id=7283079784329052726\&tt_from=mobile_qq\&utm_source=mobile_qq\&utm_medium=toutiao_android\&utm_campaign=client_share\&share_token=b7961b29-d87a-4b6c-bb8e-c7c213388390\&source=m_redirect

【往期回顾】Github开源项目月刊精选-2023年8月

原创2023-09-27 08:30·Github推荐官

https://blog.csdn.net/weixin_45533131/article/details/132817683?share_token=72d8a161-4d49-4795-ad21-2ce5e2e4b197

在Linux(Centos7)上编译whisper.cpp的详细教程

https://blog.csdn.net/u012234115/article/details/134668510?share_token=e3835a0d-ac3b-4c86-9e32-e79ec85cddbe

开源C++智能语音识别库whisper.cpp开发使用入门

https://www.toutiao.com/article/7276732434920653312/?app=news_article\&timestamp=1706802934\&use_new_style=1\&req_id=2024020123553463D3509B1706BC79D479\&group_id=7276732434920653312\&tt_from=mobile_qq\&utm_source=mobile_qq\&utm_medium=toutiao_android\&utm_campaign=client_share\&share_token=7bcb7488-a03d-4291-96fb-d0835ac76cca\&source=m_redirect

OpenAI的whisper的c/c++ 版本体验

首先下载代码,注:我的OS环境是ubuntu 18.04。

https://post.smzdm.com/p/a3052kz7/?share_token=d4057cba-adb0-4c91-8a8b-d8a7adcf4087

显卡怎么玩 篇三:音频转字幕神器whisper升级版,whisper-webui使用教程

https://www.toutiao.com/article/7311876528407921162/?app=news_article\&timestamp=1706801102\&use_new_style=1\&req_id=20240201232501647517150775FC7AD89A\&group_id=7311876528407921162\&tt_from=mobile_qq\&utm_source=mobile_qq\&utm_medium=toutiao_android\&utm_campaign=client_share\&share_token=dfa1976e-9422-49d2-a73b-6453becea90c\&source=m_redirect

2023 AI 界7个最火的 Text-to-Video 模型

动画

https://www.toutiao.com/article/7312473532829745700/?app=news_article\&timestamp=1706801052\&use_new_style=1\&req_id=2024020123241265D9BE3F954EB979A010\&group_id=7312473532829745700\&tt_from=mobile_qq\&utm_source=mobile_qq\&utm_medium=toutiao_android\&utm_campaign=client_share\&share_token=ca5d0d2a-2d9b-4959-b5c0-3dd869555240\&source=m_redirect

推荐5款本周 超火 的开源AI项目

原创2023-12-15 07:32·程序员梓羽同学

https://blog.csdn.net/chenlu5201314/article/details/131156770?share_token=b8796ff0-44f8-471a-af6d-c1bc7ca57002

【开源工具】使用Whisper提取视频、语音的字幕

1、下载安装包Assets\WhisperDesktop.zip

https://www.toutiao.com/article/7222852915286016544/?app=news_article\&timestamp=1706460752\&use_new_style=1\&req_id=2024012900523164164830D4E1ECF3CCE2\&group_id=7222852915286016544\&tt_from=mobile_qq\&utm_source=mobile_qq\&utm_medium=toutiao_android\&utm_campaign=client_share\&share_token=9bc8621f-b3b1-4f49-ae20-5214c1254515\&source=m_redirect

从零开始,手把手教本地部署Stable Diffusion AI绘画 V3版 (Win最新)

原创2023-04-17 11:23·觉悟之坡

https://blog.csdn.net/S_eashell/article/details/135258411?share_token=f998e896-6dff-4fd4-8df2-c6aae132e95c

98秒转录2.5小时音频,最强音频转文字软件insanely-fast-whisper下载部署

老艾的AI世界 已于 2024-01-05 20:20:51 修改

相关推荐
昨日之日20061 天前
Moonshine - 新型开源ASR(语音识别)模型,体积小,速度快,比OpenAI Whisper快五倍 本地一键整合包下载
人工智能·whisper·语音识别
新缸中之脑10 天前
基于Distil-Whisper的实时ASR【自动语音识别】
人工智能·whisper·语音识别
敢敢のwings16 天前
如何在Windows平台下基于Whisper来训练自己的数据
windows·whisper·1024程序员节
z千鑫16 天前
【OpenAI】第六节(语音生成与语音识别技术)从 ChatGPT 到 Whisper 的全方位指南
人工智能·chatgpt·whisper·gpt-3·openai·语音识别·codemoss能用ai
bug智造19 天前
Whisper 音视频转写
whisper·音视频
客院载论21 天前
论文学习——基于Whisper迁移学习的阿尔兹海默症检测方法——音频特征和语义特征的结合
学习·whisper·迁移学习
htsitr23 天前
实时语音转文字(基于NAudio+Whisper+VOSP+Websocket)
whisper
盼海23 天前
安装openai-whisper 失败
python·whisper
aiAIman24 天前
主流显卡和 CPU 进行 Whisper 转录性能 RTF 转录时间估算
人工智能·python·语言模型·whisper
碳治郎AI1 个月前
【AIGC】OpenAI 宣布推出Whisper large-v3-turbo 语音转录模型 速度提高了8倍
人工智能·whisper·aigc