文章目录
Sherpa-NCNN 是一个基于 C++ 的轻量级神经网络推理框架,是 kaldi 下的一个子项目,它专门针对移动设备和嵌入式系统进行了优化。 Sherpa-NCNN 的目标是提供高性能、低延迟的推理能力,适用于移动设备和嵌入式系统,可以以满足实时应用需求。仓库地址: https://github.com/k2-fsa/sherpa-ncnn
主要功能:语音识别、流式语音识别。即边说话,边识别。不需要访问网络,不需要数据传输,完全本地识别。
识别效果:识别速度很快,效果比较好,但是只支持wav格式的音频,其他格式的需要转换后才能识别。
官方使用说明:https://k2-fsa.github.io/sherpa/ncnn/index.html
1、ubuntu 编译运行
依赖安装
shell
$ sudo apt-get install -y git cmake
下载与编译
shell
$ git clone https://github.com/k2-fsa/sherpa-ncnn
$ cd sherpa-ncnn
$ mkdir build && cd build
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ make -j8
编译脚本会自动下载编译需要的相关文件,编译完成后可执行文件在 build/bin 目录下。
shell
$ cd bin
$ ll
total 36M
-rwxrwxr-x 1 cys cys 7.2M Dec 31 08:26 decode-file-c-api
-rwxrwxr-x 1 cys cys 7.1M Dec 31 08:26 generate-int8-scale-table
-rwxrwxr-x 1 cys cys 7.2M Dec 31 08:26 sherpa-ncnn
-rwxrwxr-x 1 cys cys 7.2M Dec 31 08:26 sherpa-ncnn-alsa
-rwxrwxr-x 1 cys cys 7.3M Dec 31 08:26 sherpa-ncnn-microphone
其中:
- sherpa-ncnn:用于识别单个 wav 文件
- sherpa-ncnn-microphone:用于识别麦克风的实时语音
模型下载
sherpa-ncnn 已经提供了一些预先训练好的模型可以下载,可在 https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html 页面下查看。有专门的小模型可应用于 Raspberry Pi 4 之类的嵌入式板卡上,在 PC 上可以选择大一点的模型,做一些对比,选择了 csukuangfj/sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13 (Bilingual, Chinese + English) 这个模型在 PC 上运行。
shell
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13.tar.bz2
运行
- 识别单个 wav 文件
模型文件自带了 5 个 wav 文件,在模型的 test_wavs 目录下。在 sherpa-ncnn 目录下执行以下命令:
shell
$ for method in greedy_search modified_beam_search; do
./build/bin/sherpa-ncnn \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav \
2 \
$method
done
运行成功后会显示如下日志:
shell
$ RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS ALWAYS什么意思啊
timestamps: 0.96 1.04 1.28 1.4 1.48 1.72 1.84 2.04 2.44 3.24 3.6 3.84 4.36 4.72 4.76 4.92 5.04 5.08
Elapsed seconds: 0.726 s
Real time factor (RTF): 0.726 / 5.100 = 0.142
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS ALWAYS什么意思啊
timestamps: 0.96 1.04 1.28 1.4 1.48 1.72 1.84 2.04 2.44 3.2 3.52 3.84 4.36 4.72 4.76 4.92 5.04 5.08
Elapsed seconds: 1.172 s
Real time factor (RTF): 1.172 / 5.100 = 0.230
与 1.wav 文件播放比较,识别的文字 100% 一致。波形文件长度为 5.2s 识别花了大约 0.726 s/1.172 s
shell
$ for method in greedy_search modified_beam_search; do
./build/bin/sherpa-ncnn \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav \
2 \
$method
done
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav
wav duration (s): 10.0531
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav
text: 昨天是 MONDAY TODAY IS LIBR THE DAY AFTER TOMORROW是星期三
timestamps: 0.64 1.04 1.6 2.12 2.2 2.44 4.2 4.4 5.12 5.56 5.8 6.16 6.84 7.12 7.44 8.04 8.16 8.24 8.28 9.08 9.4 9.64 9.88
Elapsed seconds: 1.440 s
Real time factor (RTF): 1.440 / 10.053 = 0.143
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav
wav duration (s): 10.0531
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav
text: 昨天是 MONDAY TODAY IS礼拜二 THE DAY AFTER TOMORROW是星期三
timestamps: 0.64 1.04 1.6 2.12 2.2 2.44 4.2 4.4 4.96 5.56 5.8 6 6.84 7.12 7.44 8.04 8.16 8.24 8.28 9.08 9.4 9.64 9.88
Elapsed seconds: 1.955 s
Real time factor (RTF): 1.955 / 10.053 = 0.194
波形文件长度为 10.05s 识别花了大约 1.440 s/1.955 s
- 识别麦克风实时语音
在 sherpa-ncnn 目录下执行以下命令:
shell
$ ./build/bin/sherpa-ncnn-microphone \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
2 \
greedy_search
在日志中提示 started 之后,对着 PC 的麦克风说话,日志中会实时显示对应的文字:
shell
$ ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
xcb_connection_has_error() returned true
xcb_connection_has_error() returned true
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
xcb_connection_has_error() returned true
xcb_connection_has_error() returned true
Num devices: 11
Use default device: 10
Name: default
Max input channels: 32
xcb_connection_has_error() returned true
Started
0:轻轻的我走了正如我轻轻的来
1:我轻轻的招手
效果还是相当不错的。
2、树莓派 4B 编译运行
确认树莓派 4B 环境
在树莓派终端上运行以下命令,确认当前操作系统是 32 位还是 64 位
shell
$ getconf LONG_BIT
32
当前安装的是32位操作系统。
可参考说明 https://k2-fsa.github.io/sherpa/ncnn/install/arm-embedded-linux.html 完成交叉编译
交叉编译
交叉编译器可以使我们在主机上编译出可以在嵌入式设备上运行的程序
下载 gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf.tar.xz 并安装
shell
$ wget https://media.githubusercontent.com/media/Vicent1992/prebuilts/master/gcc/arm/gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf.tar.xz
$ tar xvf gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf.tar.xz -C /opt
$ export PATH=/opt/gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf/bin:$PATH
$ arm-linux-gnueabihf-gcc --version
arm-linux-gnueabihf-gcc (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
交叉编译
shell
$ sudo apt-get install -y libtool
$ git clone https://github.com/k2-fsa/sherpa-ncnn
$ cd sherpa-ncnn
$ ./build-arm-linux-gnueabihf.sh
编译完成后可执行文件位于 build-arm-linux-gnueabihf/install/bin 目录,将这 2 个文件复制至树莓派 4b 内。
shell
$ ll build-arm-linux-gnueabihf/install/bin
total 4.6M
-rwxr-xr-x 1 cys cys 2.3M Dec 31 10:40 sherpa-ncnn
-rwxr-xr-x 1 cys cys 2.3M Dec 31 10:40 sherpa-ncnn-alsa
模型下载与运行
- 下载
在树莓派 4b 上下载模式,可在 https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/small-models.html 页面下载对应模型。
shell
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23.tar.bz2
- 运行
在终端中运行以下命令:
shell
$ for method in greedy_search modified_beam_search; do
./sherpa-ncnn \
./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/tokens.txt \
./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav \
2 \
$method
done
运行日志如下:
shell
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav
wav duration (s): 5.6115
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav
text: 对我做了介绍那么我想说的是大家如果对我的研究感兴趣
timestamps: 0.32 0.64 0.76 0.96 1.08 1.16 1.96 2.04 2.24 2.36 2.56 2.68 2.8 3.36 3.52 3.64 3.72 3.84 3.92 4 4.08 4.24 4.48 4.56 4.72
Elapsed seconds: 1.092 s
Real time factor (RTF): 1.092 / 5.611 = 0.195
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav
wav duration (s): 5.6115
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav
text: 对我做了介绍那么我想说的是大家如果对我的研究感兴趣
timestamps: 0.32 0.64 0.72 0.96 1.08 1.16 1.96 2.04 2.24 2.36 2.56 2.68 2.8 3.36 3.52 3.64 3.72 3.84 3.92 4 4.08 4.24 4.48 4.56 4.72
Elapsed seconds: 1.596 s
模型对比测试
shell
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-20M-2023-02-17.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-20M-2023-02-17.tar.bz2
运行:
shell
$ for method in greedy_search modified_beam_search; do
./sherpa-ncnn \
./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/tokens.txt \
./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav \
2 \
$method
done
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav
text: AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BRAFFLELS
timestamps: 0.68 1 1.16 1.2 1.48 1.72 1.76 1.92 2 2.16 2.28 2.36 2.52 2.64 2.68 2.76 2.92 3.08 3.4 3.6 3.72 3.88 4.12 4.48 4.64 4.68 4.84 4.96 5.16 5.2 5.32 5.36 5.6 5.72 5.92 5.96 6.08 6.24 6.36 6.52
Elapsed seconds: 1.892 s
Real time factor (RTF): 1.892 / 6.625 = 0.286
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav
text: AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BRAFFLELS
timestamps: 0.68 1 1.16 1.2 1.48 1.72 1.8 1.92 2 2.16 2.28 2.36 2.52 2.64 2.68 2.76 2.92 3.08 3.4 3.6 3.72 3.88 4.12 4.48 4.64 4.68 4.84 4.96 5.16 5.2 5.32 5.36 5.6 5.72 5.92 5.96 6.08 6.24 6.36 6.52
Elapsed seconds: 2.051 s
Real time factor (RTF): 2.051 / 6.625 = 0.310
shell
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-conv-emformer-transducer-small-2023-01-09.tar.bz2
$ tar xvf sherpa-ncnn-conv-emformer-transducer-small-2023-01-09.tar.bz2
运行
shell
$ ./sherpa-ncnn \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/tokens.txt \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/tokens.txt", encoder num_threads=4, decoder num_threads=4, joiner num_threads=4), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav
text: AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BRAFFLES
timestamps: 0.32 0.92 1 1.12 1.4 1.56 1.6 1.76 1.96 2.08 2.2 2.24 2.4 2.52 2.56 2.64 2.8 3 3.24 3.48 3.6 3.72 3.92 4.36 4.48 4.52 4.76 4.84 5 5.04 5.16 5.2 5.44 5.6 5.8 5.84 5.96 6.04 6.24
Elapsed seconds: 0.951 s
Real time factor (RTF): 0.951 / 6.625 = 0.144
使用 int8 量化运行:
shell
$ /sherpa-ncnn \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/tokens.txt \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.int8.param \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.int8.bin \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.int8.param \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.int8.bin \
./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav
shell
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-lstm-transducer-small-2023-02-13.tar.bz2
$ tar xvf sherpa-ncnn-lstm-transducer-small-2023-02-13.tar.bz2
运行:
shell
./sherpa-ncnn \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/tokens.txt \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-lstm-transducer-small-2023-02-13/test_wavs/0.wav
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-lstm-transducer-small-2023-02-13/tokens.txt", encoder num_threads=4, decoder num_threads=4, joiner num_threads=4), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-lstm-transducer-small-2023-02-13/test_wavs/0.wav
wav duration (s): 16.73
Started!
Done!
Recognition result for ./sherpa-ncnn-lstm-transducer-small-2023-02-13/test_wavs/0.wav
text: IF UP它也是一个它这里是一个介词那么它后面加的是什么呢刚才老师说的它加了是什么呢对吧它也是加 I N G对吧它为什么也就 I N G呢
timestamps: 4.04 4.36 4.64 4.84 4.96 5.08 5.16 5.72 5.84 5.96 6.2 6.32 6.4 6.6 6.68 6.84 6.92 7 7.08 7.2 7.36 7.44 7.52 7.6 7.64 7.8 7.96 8.04 8.16 8.28 8.4 8.48 8.64 8.76 8.84 8.96 9.04 9.08 9.12 14.2 14.24 14.36 14.44 14.56 14.72 14.92 14.96 15.04 15.2 15.28 15.6 15.72 15.84 15.88 16 16.16 16.36 16.4 16.44 16.52
Elapsed seconds: 3.441 s
Real time factor (RTF): 3.441 / 16.730 = 0.206
shell
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16.tar.bz2
运行:
shell
$ for method in greedy_search modified_beam_search; do
./sherpa-ncnn \
./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/tokens.txt \
./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav \
2 \
$method
done
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS什么意思啊
timestamps: 0.96 1.08 1.28 1.4 1.52 1.76 1.84 2.08 2.56 3.04 3.6 3.84 4.68 4.76 4.92 5.04 5.08
Elapsed seconds: 1.638 s
Real time factor (RTF): 1.638 / 5.100 = 0.321
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS什么意思啊
timestamps: 0.96 1.08 1.28 1.4 1.52 1.76 1.84 2.08 2.56 3.04 3.6 3.84 4.68 4.76 4.92 5.04 5.08
Elapsed seconds: 2.411 s
Real time factor (RTF): 2.411 / 5.100 = 0.473
树莓派 4B 运行大模型
- 下载模型
shell
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13.tar.bz2
- 运行
shell
$ for method in greedy_search modified_beam_search; do
./sherpa-ncnn \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav \
2 \
$method
done
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS ALWAYS什么意思啊
timestamps: 0.96 1.04 1.28 1.4 1.48 1.72 1.84 2.04 2.44 3.24 3.6 3.84 4.36 4.72 4.76 4.92 5.04 5.08
Elapsed seconds: 4.076 s
Real time factor (RTF): 4.076 / 5.100 = 0.799
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS ALWAYS什么意思啊
timestamps: 0.96 1.04 1.28 1.4 1.48 1.72 1.84 2.04 2.44 3.2 3.52 3.84 4.36 4.72 4.76 4.92 5.04 5.08
Elapsed seconds: 4.935 s
Real time factor (RTF): 4.935 / 5.100 = 0.968