使用ffmpeg把英语四级听力MP3文件转成wav供模型识别

https://www.wehuster.com/cet4 网站下载听力音频文件,https://www.wehuster.com/static/cet4/cet4_2025_06_1.mp3

它不能直接被LFM2.5-Audio模型使用。

所以用ffmpeg工具来转换。

直接不加参数转换出来的wav文件导致模型异常退出。

所以检测LFM2.5-Audio模型生成的wav文件信息。

复制代码
C:\d>ffmpeg\bin\ffprobe \d\models\audio\OUTPUT.WAV
ffprobe version 8.0.1-essentials_build-www.gyan.dev Copyright (c) 2007-2025 the FFmpeg developers

Input #0, wav, from '\d\models\audio\OUTPUT.WAV':
  Duration: 00:00:09.74, bitrate: 768 kb/s
  Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 24000 Hz, 1 channels, flt, 768 kb/s

然后按照同样的采样率和声道转换, 截取2分钟,转换后检测信息是否与输入一致。

复制代码
C:\d>ffmpeg\bin\ffmpeg -i cet4_2025_06_1.mp3 -ar 24000 -ac 1 -ss 00:00:50 -t 00:02:00.0 cet40.wav

  Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, mono, s16, 384 kb/s (default)
    Metadata:
      encoder         : Lavc62.11.100 pcm_s16le
      creation_time   : 2025-07-06T09:52:08.000000Z
      handler_name    : aac@GPAC0.5.2-DEV-revVersion: 0.5.2-426-gc5ad4e4+dfsg5-1ubuntu0.1
      vendor_id       : [0][0][0][0]
[out#0/wav @ 00000241a84a2cc0] video:0KiB audio:5625KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.001354%
size=    5625KiB time=00:02:00.00 bitrate= 384.0kbits/s speed= 721x elapsed=0:00:00.16

C:\d>ffmpeg\bin\ffprobe cet40.wav

  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s16, 384 kb/s

将这个文件用于ASR

复制代码
llama-audio/llama-liquid-audio-cli -m LFM2.5-Audio-1.5B-Q4_0.gguf -mm mmproj-LFM2.5-Audio-1.5B-Q4_0.gguf -mv vocoder-LFM2.5-Audio-1.5B-Q4_0.gguf --tts-speaker-file tokenizer-LFM2.5-Audio-1.5B-Q4_0.gguf -sys "Perform ASR." --audio /par/cet40.wav
=== GENERATED TEXT ===

Knew it tasted good, but butter made it better. His passion for produce got him a position as South Dakota's official corn ambassador. To Reek's profession of love for his favorite vegetable earned him the name the Corn Kid. This was after his online interview attracted millions of views and was made into a song. South Dakota is one of the top corn producers in America. The corn provides nourishment across the globe. To Reek, and his family, were invited to attend the honorary ceremony at the state's corn palace. Officials wanted to highlight South Dakota's two largest industries: tourism and agriculture. To Reek couldn't believe his eyes when he saw the palace made of corn. Questions one and two are based on the news report you have just heard. Question one: How did To Reek make corn taste better? Question two: Why were To Reek and his family invited to South Dakota? News report two: Two arrests were announced Thursday. The arrests were in connection with a string of mail thefts.

与网上的原文https://wenku.baidu.com/view/b89a86a3f90a79563c1ec5da50e2524de518d020.html?*wkts*=1773551675978 比较,除了人名,识别还是不错的

相关推荐
花千树-01018 小时前
IndexTTS2 在 macOS 性能最佳设置(M1/M2/M3/M4 全适用)
人工智能·深度学习·macos·ai·语音识别·ai编程
渡我白衣18 小时前
见微知著——特征工程的科学与艺术
人工智能·深度学习·神经网络·机器学习·计算机视觉·自然语言处理·语音识别
Memory_荒年1 天前
FFmpeg:音视频界的“万能瑞士军刀”
ffmpeg
深念Y1 天前
多模态技术详解:TTS、ASR、OCR
ide·ai·语音识别·agi·多模态·文字识别·实时语言
QJtDK1R5a2 天前
V4L2 vs GStreamer vs FFmpeg:Linux多媒体处理的三个层级
linux·运维·ffmpeg
小龙报2 天前
【Coze-AI智能体平台】Coze智能体实操:翻译助手从工作流搭建到应用发布全流程详解
人工智能·深度学习·计算机视觉·chatgpt·语音识别·文心一言·集成学习
李永奉2 天前
杰理芯片SDK-详细讲解AC695N/AC696N芯片SDK中APP模式流程
单片机·嵌入式硬件·物联网·语音识别
LaughingZhu2 天前
移动端 AI 的价值重估:设备端智能的拐点
大数据·人工智能·经验分享·搜索引擎·语音识别
深念Y3 天前
从WebSocket到WebRTC,豆包级实时语音交互背后的技术演进
websocket·网络协议·实时互动·webrtc·语音识别·实时音视频
LaughingZhu3 天前
Anthropic 收购 Oven 后,Claude Code 用运行时写了一篇护城河文章
大数据·人工智能·经验分享·搜索引擎·语音识别