使用ffmpeg把英语四级听力MP3文件转成wav供模型识别

https://www.wehuster.com/cet4 网站下载听力音频文件,https://www.wehuster.com/static/cet4/cet4_2025_06_1.mp3

它不能直接被LFM2.5-Audio模型使用。

所以用ffmpeg工具来转换。

直接不加参数转换出来的wav文件导致模型异常退出。

所以检测LFM2.5-Audio模型生成的wav文件信息。

复制代码
C:\d>ffmpeg\bin\ffprobe \d\models\audio\OUTPUT.WAV
ffprobe version 8.0.1-essentials_build-www.gyan.dev Copyright (c) 2007-2025 the FFmpeg developers

Input #0, wav, from '\d\models\audio\OUTPUT.WAV':
  Duration: 00:00:09.74, bitrate: 768 kb/s
  Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 24000 Hz, 1 channels, flt, 768 kb/s

然后按照同样的采样率和声道转换, 截取2分钟,转换后检测信息是否与输入一致。

复制代码
C:\d>ffmpeg\bin\ffmpeg -i cet4_2025_06_1.mp3 -ar 24000 -ac 1 -ss 00:00:50 -t 00:02:00.0 cet40.wav

  Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, mono, s16, 384 kb/s (default)
    Metadata:
      encoder         : Lavc62.11.100 pcm_s16le
      creation_time   : 2025-07-06T09:52:08.000000Z
      handler_name    : aac@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-1ubuntu0.1
      vendor_id       : [0][0][0][0]
[out#0/wav @ 00000241a84a2cc0] video:0KiB audio:5625KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.001354%
size=    5625KiB time=00:02:00.00 bitrate= 384.0kbits/s speed= 721x elapsed=0:00:00.16

C:\d>ffmpeg\bin\ffprobe cet40.wav

  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s16, 384 kb/s

将这个文件用于ASR

复制代码
llama-audio/llama-liquid-audio-cli -m LFM2.5-Audio-1.5B-Q4_0.gguf -mm mmproj-LFM2.5-Audio-1.5B-Q4_0.gguf -mv vocoder-LFM2.5-Audio-1.5B-Q4_0.gguf --tts-speaker-file tokenizer-LFM2.5-Audio-1.5B-Q4_0.gguf -sys "Perform ASR." --audio /par/cet40.wav
=== GENERATED TEXT ===

Knew it tasted good, but butter made it better. His passion for produce got him a position as South Dakota's official corn ambassador. To Reek's profession of love for his favorite vegetable earned him the name the Corn Kid. This was after his online interview attracted millions of views and was made into a song. South Dakota is one of the top corn producers in America. The corn provides nourishment across the globe. To Reek, and his family, were invited to attend the honorary ceremony at the state's corn palace. Officials wanted to highlight South Dakota's two largest industries: tourism and agriculture. To Reek couldn't believe his eyes when he saw the palace made of corn. Questions one and two are based on the news report you have just heard. Question one: How did To Reek make corn taste better? Question two: Why were To Reek and his family invited to South Dakota? News report two: Two arrests were announced Thursday. The arrests were in connection with a string of mail thefts.

与网上的原文https://wenku.baidu.com/view/b89a86a3f90a79563c1ec5da50e2524de518d020.html?*wkts*=1773551675978 比较,除了人名,识别还是不错的

相关推荐
智慧地球(AI·Earth)3 小时前
谷歌发布 Gemini Embedding 2:首个原生全模态向量模型,打通音视频与图文!
音视频·语音识别·embedding
BIBABULALA3 小时前
语音算法面试复习系列3——语音识别基础 + CTC 详解
语音识别
fanxianshi1 天前
2026 年 3 月行业动态与开源生态全景报告
人工智能·深度学习·神经网络·机器学习·计算机视觉·开源·语音识别
Dev7z2 天前
基于MATLAB改进小波阈值函数的信号降噪方法研究
人工智能·语音识别
EasyDSS2 天前
EasyDSS如何基于LiveKit/AI大模型/AI会议助手/语音转写STT技术破解音视频应用核心痛点
人工智能·音视频·webrtc·语音识别·点播技术·流媒体直播
liulilittle2 天前
Windows 11 上搭建 YouTube 视频下载工具:yt-dlp + FFmpeg
windows·ffmpeg·音视频
指尖在键盘上舞动2 天前
Cannot find matching video player interface for ‘ffpyplayer‘.解决方案
linux·ubuntu·ffmpeg·psychopy·ffpyplayer
ViiTor_AI2 天前
ElevenLabs 语音克隆工具深度评测:价格、功能与最佳替代方案
人工智能·语音识别
HySpark2 天前
解决语音角色识别中的误识别与长会漂移问题(陌生人机制 + 稳定性规则)
人工智能·语音识别