使用ffmpeg把英语四级听力MP3文件转成wav供模型识别

https://www.wehuster.com/cet4 网站下载听力音频文件,https://www.wehuster.com/static/cet4/cet4_2025_06_1.mp3

它不能直接被LFM2.5-Audio模型使用。

所以用ffmpeg工具来转换。

直接不加参数转换出来的wav文件导致模型异常退出。

所以检测LFM2.5-Audio模型生成的wav文件信息。

复制代码
C:\d>ffmpeg\bin\ffprobe \d\models\audio\OUTPUT.WAV
ffprobe version 8.0.1-essentials_build-www.gyan.dev Copyright (c) 2007-2025 the FFmpeg developers

Input #0, wav, from '\d\models\audio\OUTPUT.WAV':
  Duration: 00:00:09.74, bitrate: 768 kb/s
  Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 24000 Hz, 1 channels, flt, 768 kb/s

然后按照同样的采样率和声道转换, 截取2分钟,转换后检测信息是否与输入一致。

复制代码
C:\d>ffmpeg\bin\ffmpeg -i cet4_2025_06_1.mp3 -ar 24000 -ac 1 -ss 00:00:50 -t 00:02:00.0 cet40.wav

  Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, mono, s16, 384 kb/s (default)
    Metadata:
      encoder         : Lavc62.11.100 pcm_s16le
      creation_time   : 2025-07-06T09:52:08.000000Z
      handler_name    : aac@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-1ubuntu0.1
      vendor_id       : [0][0][0][0]
[out#0/wav @ 00000241a84a2cc0] video:0KiB audio:5625KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.001354%
size=    5625KiB time=00:02:00.00 bitrate= 384.0kbits/s speed= 721x elapsed=0:00:00.16

C:\d>ffmpeg\bin\ffprobe cet40.wav

  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s16, 384 kb/s

将这个文件用于ASR

复制代码
llama-audio/llama-liquid-audio-cli -m LFM2.5-Audio-1.5B-Q4_0.gguf -mm mmproj-LFM2.5-Audio-1.5B-Q4_0.gguf -mv vocoder-LFM2.5-Audio-1.5B-Q4_0.gguf --tts-speaker-file tokenizer-LFM2.5-Audio-1.5B-Q4_0.gguf -sys "Perform ASR." --audio /par/cet40.wav
=== GENERATED TEXT ===

Knew it tasted good, but butter made it better. His passion for produce got him a position as South Dakota's official corn ambassador. To Reek's profession of love for his favorite vegetable earned him the name the Corn Kid. This was after his online interview attracted millions of views and was made into a song. South Dakota is one of the top corn producers in America. The corn provides nourishment across the globe. To Reek, and his family, were invited to attend the honorary ceremony at the state's corn palace. Officials wanted to highlight South Dakota's two largest industries: tourism and agriculture. To Reek couldn't believe his eyes when he saw the palace made of corn. Questions one and two are based on the news report you have just heard. Question one: How did To Reek make corn taste better? Question two: Why were To Reek and his family invited to South Dakota? News report two: Two arrests were announced Thursday. The arrests were in connection with a string of mail thefts.

与网上的原文https://wenku.baidu.com/view/b89a86a3f90a79563c1ec5da50e2524de518d020.html?*wkts*=1773551675978 比较,除了人名,识别还是不错的

相关推荐
矜辰所致1 小时前
嵌入式语音开发应用基础说明
ffmpeg·ai 语音·嵌入式语音·语音播放·语音采样
某林21211 小时前
ROS2 语音机器人实战:从 KCF 跟随失效到 RTAB-Map 建图闭环的完整排障
人工智能·机器人·语音识别·ros2·架构重构·技术复盘·c++底层排错
2501_9422792513 小时前
4个关键标准选对“发音器”,确保最佳性能
语音识别
luoyayun36115 小时前
Qt + FFmpeg 实战:音频静音段检测
qt·ffmpeg·音视频·静音段检测
piao96182716 小时前
2026智能工牌怎么选?国内智能工牌厂商及行业分析
人工智能·语音识别
2501_9422792516 小时前
录音AI技术增强语音识别准确性,优化录音整理体验
语音识别
时代文章2 天前
GPT-SoVITS 模型测试笔记
笔记·gpt·语音识别
七月稻草人2 天前
用30秒声音复刻自己的音色:Index-TTS远程部署与公网访问实践
人工智能·语音识别
小鹿研究点东西2 天前
直播带货长视频AI自动剪辑开播:一场直播如何反复利用?
ffmpeg·自动化·音视频·语音识别
俊基科技2 天前
智慧矿山通信升级:AP-0316 语音处理模组破解矿井对讲降噪、回声与远场拾音难题
语音识别·硬件开发·ai降噪·智慧矿山·回音消除·矿场通信