Kimi 7B 语音转文字

1. 模型准备

js 复制代码
from modelscope import snapshot_download
model_dir = snapshot_download('moonshotai/Kimi-Audio-7B-Instruct',
                              cache_dir="./models")

2. 安装和初步推理

clone仓库需要clone整个子模块

js 复制代码
git clone https://github.com/MoonshotAI/Kimi-Audio.git
git submodule update --init --recursive

或者
git clone --recursive https://github.com/MoonshotAI/Kimi-Audio.git

安装完依赖后使用python infer.py测试

python 复制代码
import soundfile as sf
# Assuming the KimiAudio class is available after installation
from kimia_infer.api.kimia import KimiAudio
import torch # Ensure torch is imported if needed for device placement


model_path = "/root/xx/models/moonshotai/Kimi-Audio-7B-Instruct" # IMPORTANT: Update this path if loading locally
model = KimiAudio(model_path=model_path, 
                  load_detokenizer=True)
model.to(device)
print("load model from locally!")
# --- 2. Define Sampling Parameters ---
sampling_params = {
    "audio_temperature": 0.8,
    "audio_top_k": 10,
    "text_temperature": 0.0,
    "text_top_k": 5,
    "audio_repetition_penalty": 1.0,
    "audio_repetition_window_size": 64,
    "text_repetition_penalty": 1.0,
    "text_repetition_window_size": 16,
}

# --- 3. Example 1: Audio-to-Text (ASR) ---
# TODO: Provide actual example audio files or URLs accessible to users
# E.g., download sample files first or use URLs
# wget https://path/to/your/asr_example.wav -O asr_example.wav
# wget https://path/to/your/qa_example.wav -O qa_example.wav
asr_audio_path = "test_audios/asr_example.wav" # IMPORTANT: Make sure this file exists
qa_audio_path = "test_audios/qa_example.wav" # IMPORTANT: Make sure this file exists

messages_asr = [
    {"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"},
    {"role": "user", "message_type": "audio", "content": asr_audio_path}
]

# Generate only text output
# Note: Ensure the model object and generate method accept device placement if needed
_, text_output = model.generate(messages_asr, **sampling_params, output_type="text")
print(">>> ASR Output Text: ", text_output)
# Expected output: "这并不是告别,这是一个篇章的结束,也是新篇章的开始。" (Example)

# --- 4. Example 2: Audio-to-Audio/Text Conversation ---
messages_conversation = [
    {"role": "user", "message_type": "audio", "content": qa_audio_path}
]

# Generate both audio and text output
wav_output, text_output = model.generate(messages_conversation, **sampling_params, output_type="both")

# Save the generated audio
output_audio_path = "output_audio.wav"
# Ensure wav_output is on CPU and flattened before saving
sf.write(output_audio_path, wav_output.detach().cpu().view(-1).numpy(), 24000) # Assuming 24kHz output
print(f">>> Conversational Output Audio saved to: {output_audio_path}")
print(">>> Conversational Output Text: ", text_output)
# Expected output: "A." (Example)

print("Kimi-Audio inference examples complete.")
相关推荐
三维重建-光栅投影1 小时前
VS中将cuda项目编译为DLL并调用
算法
课堂剪切板3 小时前
ch03 部分题目思路
算法
山登绝顶我为峰 3(^v^)34 小时前
如何录制带备注的演示文稿(LaTex Beamer + Pympress)
c++·线性代数·算法·计算机·密码学·音视频·latex
Two_brushes.5 小时前
【算法】宽度优先遍历BFS
算法·leetcode·哈希算法·宽度优先
森焱森7 小时前
水下航行器外形分类详解
c语言·单片机·算法·架构·无人机
QuantumStack9 小时前
【C++ 真题】P1104 生日
开发语言·c++·算法
写个博客10 小时前
暑假算法日记第一天
算法
绿皮的猪猪侠10 小时前
算法笔记上机训练实战指南刷题
笔记·算法·pta·上机·浙大
hie9889411 小时前
MATLAB锂离子电池伪二维(P2D)模型实现
人工智能·算法·matlab
杰克尼11 小时前
BM5 合并k个已排序的链表
数据结构·算法·链表