离线语音识别 sherpa-ncnn 尝鲜体验

文章目录

Sherpa-NCNN 是一个基于 C++ 的轻量级神经网络推理框架,是 kaldi 下的一个子项目,它专门针对移动设备和嵌入式系统进行了优化。 Sherpa-NCNN 的目标是提供高性能、低延迟的推理能力,适用于移动设备和嵌入式系统,可以以满足实时应用需求。仓库地址: https://github.com/k2-fsa/sherpa-ncnn

主要功能:语音识别、流式语音识别。即边说话,边识别。不需要访问网络,不需要数据传输,完全本地识别。

识别效果:识别速度很快,效果比较好,但是只支持wav格式的音频,其他格式的需要转换后才能识别。

官方使用说明:https://k2-fsa.github.io/sherpa/ncnn/index.html

1、ubuntu 编译运行

依赖安装

shell 复制代码
$ sudo apt-get install -y git cmake

下载与编译

shell 复制代码
$ git clone https://github.com/k2-fsa/sherpa-ncnn
$ cd sherpa-ncnn
$ mkdir build && cd build
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ make -j8

编译脚本会自动下载编译需要的相关文件,编译完成后可执行文件在 build/bin 目录下。

shell 复制代码
$ cd bin
$ ll
total 36M
-rwxrwxr-x 1 cys cys 7.2M Dec 31 08:26 decode-file-c-api
-rwxrwxr-x 1 cys cys 7.1M Dec 31 08:26 generate-int8-scale-table
-rwxrwxr-x 1 cys cys 7.2M Dec 31 08:26 sherpa-ncnn
-rwxrwxr-x 1 cys cys 7.2M Dec 31 08:26 sherpa-ncnn-alsa
-rwxrwxr-x 1 cys cys 7.3M Dec 31 08:26 sherpa-ncnn-microphone

其中:

  • sherpa-ncnn:用于识别单个 wav 文件
  • sherpa-ncnn-microphone:用于识别麦克风的实时语音

模型下载

sherpa-ncnn 已经提供了一些预先训练好的模型可以下载,可在 https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html 页面下查看。有专门的小模型可应用于 Raspberry Pi 4 之类的嵌入式板卡上,在 PC 上可以选择大一点的模型,做一些对比,选择了 csukuangfj/sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13 (Bilingual, Chinese + English) 这个模型在 PC 上运行。

shell 复制代码
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13.tar.bz2

运行

  1. 识别单个 wav 文件

模型文件自带了 5 个 wav 文件,在模型的 test_wavs 目录下。在 sherpa-ncnn 目录下执行以下命令:

shell 复制代码
$ for method in greedy_search modified_beam_search; do
  ./build/bin/sherpa-ncnn \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav \
    2 \
    $method
done

运行成功后会显示如下日志:

shell 复制代码
$ RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS ALWAYS什么意思啊
timestamps: 0.96 1.04 1.28 1.4 1.48 1.72 1.84 2.04 2.44 3.24 3.6 3.84 4.36 4.72 4.76 4.92 5.04 5.08 
Elapsed seconds: 0.726 s
Real time factor (RTF): 0.726 / 5.100 = 0.142
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS ALWAYS什么意思啊
timestamps: 0.96 1.04 1.28 1.4 1.48 1.72 1.84 2.04 2.44 3.2 3.52 3.84 4.36 4.72 4.76 4.92 5.04 5.08 
Elapsed seconds: 1.172 s
Real time factor (RTF): 1.172 / 5.100 = 0.230

与 1.wav 文件播放比较,识别的文字 100% 一致。波形文件长度为 5.2s 识别花了大约 0.726 s/1.172 s

shell 复制代码
$ for method in greedy_search modified_beam_search; do
  ./build/bin/sherpa-ncnn \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav \
    2 \
    $method
done

RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav
wav duration (s): 10.0531
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav
text: 昨天是 MONDAY TODAY IS LIBR THE DAY AFTER TOMORROW是星期三
timestamps: 0.64 1.04 1.6 2.12 2.2 2.44 4.2 4.4 5.12 5.56 5.8 6.16 6.84 7.12 7.44 8.04 8.16 8.24 8.28 9.08 9.4 9.64 9.88 
Elapsed seconds: 1.440 s
Real time factor (RTF): 1.440 / 10.053 = 0.143
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav
wav duration (s): 10.0531
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav
text: 昨天是 MONDAY TODAY IS礼拜二 THE DAY AFTER TOMORROW是星期三
timestamps: 0.64 1.04 1.6 2.12 2.2 2.44 4.2 4.4 4.96 5.56 5.8 6 6.84 7.12 7.44 8.04 8.16 8.24 8.28 9.08 9.4 9.64 9.88 
Elapsed seconds: 1.955 s
Real time factor (RTF): 1.955 / 10.053 = 0.194

波形文件长度为 10.05s 识别花了大约 1.440 s/1.955 s

  1. 识别麦克风实时语音

在 sherpa-ncnn 目录下执行以下命令:

shell 复制代码
$ ./build/bin/sherpa-ncnn-microphone \
  ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt \
  ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
  2 \
  greedy_search

在日志中提示 started 之后,对着 PC 的麦克风说话,日志中会实时显示对应的文字:

shell 复制代码
$ ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
xcb_connection_has_error() returned true
xcb_connection_has_error() returned true
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
xcb_connection_has_error() returned true
xcb_connection_has_error() returned true
Num devices: 11
Use default device: 10
  Name: default
  Max input channels: 32
xcb_connection_has_error() returned true
Started
0:轻轻的我走了正如我轻轻的来
1:我轻轻的招手

效果还是相当不错的。

2、树莓派 4B 编译运行

确认树莓派 4B 环境

在树莓派终端上运行以下命令,确认当前操作系统是 32 位还是 64 位

shell 复制代码
$ getconf LONG_BIT 
32

当前安装的是32位操作系统。

可参考说明 https://k2-fsa.github.io/sherpa/ncnn/install/arm-embedded-linux.html 完成交叉编译

交叉编译

交叉编译器可以使我们在主机上编译出可以在嵌入式设备上运行的程序

下载 gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf.tar.xz 并安装

shell 复制代码
$ wget https://media.githubusercontent.com/media/Vicent1992/prebuilts/master/gcc/arm/gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf.tar.xz
$ tar xvf gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf.tar.xz -C /opt
$ export PATH=/opt/gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf/bin:$PATH
$ arm-linux-gnueabihf-gcc --version
arm-linux-gnueabihf-gcc (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

交叉编译

shell 复制代码
$ sudo apt-get install -y libtool
$ git clone https://github.com/k2-fsa/sherpa-ncnn
$ cd sherpa-ncnn
$ ./build-arm-linux-gnueabihf.sh

编译完成后可执行文件位于 build-arm-linux-gnueabihf/install/bin 目录,将这 2 个文件复制至树莓派 4b 内。

shell 复制代码
$ ll build-arm-linux-gnueabihf/install/bin
total 4.6M
-rwxr-xr-x 1 cys cys 2.3M Dec 31 10:40 sherpa-ncnn
-rwxr-xr-x 1 cys cys 2.3M Dec 31 10:40 sherpa-ncnn-alsa

模型下载与运行

  1. 下载
    在树莓派 4b 上下载模式,可在 https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/small-models.html 页面下载对应模型。
shell 复制代码
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23.tar.bz2
  1. 运行

在终端中运行以下命令:

shell 复制代码
$ for method in greedy_search modified_beam_search; do
  ./sherpa-ncnn \
    ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/tokens.txt \
    ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav \
    2 \
    $method
done

运行日志如下:

shell 复制代码
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav
wav duration (s): 5.6115
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav
text: 对我做了介绍那么我想说的是大家如果对我的研究感兴趣
timestamps: 0.32 0.64 0.76 0.96 1.08 1.16 1.96 2.04 2.24 2.36 2.56 2.68 2.8 3.36 3.52 3.64 3.72 3.84 3.92 4 4.08 4.24 4.48 4.56 4.72 
Elapsed seconds: 1.092 s
Real time factor (RTF): 1.092 / 5.611 = 0.195
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav
wav duration (s): 5.6115
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav
text: 对我做了介绍那么我想说的是大家如果对我的研究感兴趣
timestamps: 0.32 0.64 0.72 0.96 1.08 1.16 1.96 2.04 2.24 2.36 2.56 2.68 2.8 3.36 3.52 3.64 3.72 3.84 3.92 4 4.08 4.24 4.48 4.56 4.72 
Elapsed seconds: 1.596 s

模型对比测试

shell 复制代码
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-20M-2023-02-17.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-20M-2023-02-17.tar.bz2

运行:

shell 复制代码
$ for method in greedy_search modified_beam_search; do
  ./sherpa-ncnn \
    ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/tokens.txt \
    ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav \
    2 \
    $method
done

RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav
text:  AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BRAFFLELS
timestamps: 0.68 1 1.16 1.2 1.48 1.72 1.76 1.92 2 2.16 2.28 2.36 2.52 2.64 2.68 2.76 2.92 3.08 3.4 3.6 3.72 3.88 4.12 4.48 4.64 4.68 4.84 4.96 5.16 5.2 5.32 5.36 5.6 5.72 5.92 5.96 6.08 6.24 6.36 6.52 
Elapsed seconds: 1.892 s
Real time factor (RTF): 1.892 / 6.625 = 0.286
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav
text:  AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BRAFFLELS
timestamps: 0.68 1 1.16 1.2 1.48 1.72 1.8 1.92 2 2.16 2.28 2.36 2.52 2.64 2.68 2.76 2.92 3.08 3.4 3.6 3.72 3.88 4.12 4.48 4.64 4.68 4.84 4.96 5.16 5.2 5.32 5.36 5.6 5.72 5.92 5.96 6.08 6.24 6.36 6.52 
Elapsed seconds: 2.051 s
Real time factor (RTF): 2.051 / 6.625 = 0.310
shell 复制代码
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-conv-emformer-transducer-small-2023-01-09.tar.bz2
$ tar xvf sherpa-ncnn-conv-emformer-transducer-small-2023-01-09.tar.bz2

运行

shell 复制代码
$ ./sherpa-ncnn \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/tokens.txt \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav
  
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/tokens.txt", encoder num_threads=4, decoder num_threads=4, joiner num_threads=4), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav
text:  AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BRAFFLES
timestamps: 0.32 0.92 1 1.12 1.4 1.56 1.6 1.76 1.96 2.08 2.2 2.24 2.4 2.52 2.56 2.64 2.8 3 3.24 3.48 3.6 3.72 3.92 4.36 4.48 4.52 4.76 4.84 5 5.04 5.16 5.2 5.44 5.6 5.8 5.84 5.96 6.04 6.24 
Elapsed seconds: 0.951 s
Real time factor (RTF): 0.951 / 6.625 = 0.144

使用 int8 量化运行:

shell 复制代码
$ /sherpa-ncnn \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/tokens.txt \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.int8.param \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.int8.bin \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.int8.param \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.int8.bin \
  ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav
shell 复制代码
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-lstm-transducer-small-2023-02-13.tar.bz2
$ tar xvf sherpa-ncnn-lstm-transducer-small-2023-02-13.tar.bz2

运行:

shell 复制代码
./sherpa-ncnn \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/tokens.txt \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/test_wavs/0.wav

RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-lstm-transducer-small-2023-02-13/tokens.txt", encoder num_threads=4, decoder num_threads=4, joiner num_threads=4), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-lstm-transducer-small-2023-02-13/test_wavs/0.wav
wav duration (s): 16.73
Started!
Done!
Recognition result for ./sherpa-ncnn-lstm-transducer-small-2023-02-13/test_wavs/0.wav
text:  IF UP它也是一个它这里是一个介词那么它后面加的是什么呢刚才老师说的它加了是什么呢对吧它也是加 I N G对吧它为什么也就 I N G呢
timestamps: 4.04 4.36 4.64 4.84 4.96 5.08 5.16 5.72 5.84 5.96 6.2 6.32 6.4 6.6 6.68 6.84 6.92 7 7.08 7.2 7.36 7.44 7.52 7.6 7.64 7.8 7.96 8.04 8.16 8.28 8.4 8.48 8.64 8.76 8.84 8.96 9.04 9.08 9.12 14.2 14.24 14.36 14.44 14.56 14.72 14.92 14.96 15.04 15.2 15.28 15.6 15.72 15.84 15.88 16 16.16 16.36 16.4 16.44 16.52 
Elapsed seconds: 3.441 s
Real time factor (RTF): 3.441 / 16.730 = 0.206
shell 复制代码
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16.tar.bz2

运行:

shell 复制代码
$ for method in greedy_search modified_beam_search; do
  ./sherpa-ncnn \
    ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/tokens.txt \
    ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav \
    2 \
    $method
done

RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS什么意思啊
timestamps: 0.96 1.08 1.28 1.4 1.52 1.76 1.84 2.08 2.56 3.04 3.6 3.84 4.68 4.76 4.92 5.04 5.08 
Elapsed seconds: 1.638 s
Real time factor (RTF): 1.638 / 5.100 = 0.321
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS什么意思啊
timestamps: 0.96 1.08 1.28 1.4 1.52 1.76 1.84 2.08 2.56 3.04 3.6 3.84 4.68 4.76 4.92 5.04 5.08 
Elapsed seconds: 2.411 s
Real time factor (RTF): 2.411 / 5.100 = 0.473

树莓派 4B 运行大模型

  1. 下载模型
shell 复制代码
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13.tar.bz2
  1. 运行
shell 复制代码
$ for method in greedy_search modified_beam_search; do
  ./sherpa-ncnn \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav \
    2 \
    $method
done

RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS ALWAYS什么意思啊
timestamps: 0.96 1.04 1.28 1.4 1.48 1.72 1.84 2.04 2.44 3.24 3.6 3.84 4.36 4.72 4.76 4.92 5.04 5.08 
Elapsed seconds: 4.076 s
Real time factor (RTF): 4.076 / 5.100 = 0.799
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS ALWAYS什么意思啊
timestamps: 0.96 1.04 1.28 1.4 1.48 1.72 1.84 2.04 2.44 3.2 3.52 3.84 4.36 4.72 4.76 4.92 5.04 5.08 
Elapsed seconds: 4.935 s
Real time factor (RTF): 4.935 / 5.100 = 0.968
相关推荐
一只IT攻城狮7 小时前
华为云语音交互SIS的使用案例(文字转语音-详细教程)
java·后端·华为云·音频·语音识别
hunteritself3 天前
OpenAI直播发布第11天:ChatGPT桌面客户端升级,就这?
人工智能·gpt·chatgpt·语音识别·claude
学习中的程序媛~5 天前
whisper实时语音转文字
python·whisper·语音识别
曦云沐5 天前
深入解析:选择最适合你的Whisper语音识别模型
人工智能·whisper·语音识别
黑贝是条狗5 天前
STT语音识别转文字工具 - 离线运行的本地语音识别服务
人工智能·语音识别
xingqing87y6 天前
怎么给视频加上背景音乐和文字?适合新手
人工智能·语音识别
haha_qasim6 天前
音频声音太小怎么调大?调大音频声音的几种方法
人工智能·音视频·语音识别
hunteritself9 天前
OpenAI直播发布第6天:ChatGPT视频聊天、屏幕共享和圣诞语音包!
人工智能·gpt·chatgpt·语音识别·claude
ai_xiaogui10 天前
【AIStarter】告别复杂转换 - MinerU整合包实现PDF到Markdown的无缝转变
人工智能·ai作画·pdf·语音识别·ai写作·ai软件
2301_8210269510 天前
3款在线抠图工具!效率嘎嘎高
图像处理·人工智能·语音识别