使用ffmpeg把英语四级听力MP3文件转成wav供模型识别

https://www.wehuster.com/cet4 网站下载听力音频文件,https://www.wehuster.com/static/cet4/cet4_2025_06_1.mp3

它不能直接被LFM2.5-Audio模型使用。

所以用ffmpeg工具来转换。

直接不加参数转换出来的wav文件导致模型异常退出。

所以检测LFM2.5-Audio模型生成的wav文件信息。

复制代码
C:\d>ffmpeg\bin\ffprobe \d\models\audio\OUTPUT.WAV
ffprobe version 8.0.1-essentials_build-www.gyan.dev Copyright (c) 2007-2025 the FFmpeg developers

Input #0, wav, from '\d\models\audio\OUTPUT.WAV':
  Duration: 00:00:09.74, bitrate: 768 kb/s
  Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 24000 Hz, 1 channels, flt, 768 kb/s

然后按照同样的采样率和声道转换, 截取2分钟,转换后检测信息是否与输入一致。

复制代码
C:\d>ffmpeg\bin\ffmpeg -i cet4_2025_06_1.mp3 -ar 24000 -ac 1 -ss 00:00:50 -t 00:02:00.0 cet40.wav

  Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, mono, s16, 384 kb/s (default)
    Metadata:
      encoder         : Lavc62.11.100 pcm_s16le
      creation_time   : 2025-07-06T09:52:08.000000Z
      handler_name    : aac@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-1ubuntu0.1
      vendor_id       : [0][0][0][0]
[out#0/wav @ 00000241a84a2cc0] video:0KiB audio:5625KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.001354%
size=    5625KiB time=00:02:00.00 bitrate= 384.0kbits/s speed= 721x elapsed=0:00:00.16

C:\d>ffmpeg\bin\ffprobe cet40.wav

  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s16, 384 kb/s

将这个文件用于ASR

复制代码
llama-audio/llama-liquid-audio-cli -m LFM2.5-Audio-1.5B-Q4_0.gguf -mm mmproj-LFM2.5-Audio-1.5B-Q4_0.gguf -mv vocoder-LFM2.5-Audio-1.5B-Q4_0.gguf --tts-speaker-file tokenizer-LFM2.5-Audio-1.5B-Q4_0.gguf -sys "Perform ASR." --audio /par/cet40.wav
=== GENERATED TEXT ===

Knew it tasted good, but butter made it better. His passion for produce got him a position as South Dakota's official corn ambassador. To Reek's profession of love for his favorite vegetable earned him the name the Corn Kid. This was after his online interview attracted millions of views and was made into a song. South Dakota is one of the top corn producers in America. The corn provides nourishment across the globe. To Reek, and his family, were invited to attend the honorary ceremony at the state's corn palace. Officials wanted to highlight South Dakota's two largest industries: tourism and agriculture. To Reek couldn't believe his eyes when he saw the palace made of corn. Questions one and two are based on the news report you have just heard. Question one: How did To Reek make corn taste better? Question two: Why were To Reek and his family invited to South Dakota? News report two: Two arrests were announced Thursday. The arrests were in connection with a string of mail thefts.

与网上的原文https://wenku.baidu.com/view/b89a86a3f90a79563c1ec5da50e2524de518d020.html?*wkts*=1773551675978 比较,除了人名,识别还是不错的

相关推荐
四方云13 小时前
电销系统中FreeSWITCH桥接播放自定义振铃:被叫接听后振铃持续问题解决
ffmpeg
若兰幽竹1 天前
【HarmonyOS 6.1 全场景实战】《灵犀厨房》实战(十七):【语音识别】免提声控启动播报——动口不动手
语音识别·华为鸿蒙系统·harmonyos6.1.0·灵犀厨房
花间相见1 天前
【语音识别】— FunASR 项目详解与 Fun-ASR-Nano 实战
人工智能·语音识别
花间相见1 天前
【语音识别部署】— sherpa-onnx:让 ASR 模型跑得更快、跑在任何地方
人工智能·语音识别
花花鱼1 天前
将 MP4 转为 886×1920(竖屏)使用ffmpeg处理
ffmpeg
天上路人2 天前
A-59F所有应用模式说明
人工智能·硬件架构·音视频·语音识别·实时音视频
俊基科技2 天前
A-29P深度解析:100dB回音消除与AI降噪的硬件设计实战
语音识别·ai降噪·回声消除·语音模组
杨云龙UP2 天前
Oracle RAC/ODA环境下如何准确查询PDB表空间已分配大小?一次说清Oracle表空间逻辑大小和ASM三副本实际占用_2026-05-19
linux·运维·数据库·sql·oracle·ffmpeg
曦月合一2 天前
语音识别网页版转化成APP版
app·语音识别·谷歌浏览器
byzh_rc2 天前
[自然语言处理-入门] 语音识别
人工智能·自然语言处理·语音识别