Kimi 7B 语音转文字

1. 模型准备

js 复制代码
from modelscope import snapshot_download
model_dir = snapshot_download('moonshotai/Kimi-Audio-7B-Instruct',
                              cache_dir="./models")

2. 安装和初步推理

clone仓库需要clone整个子模块

js 复制代码
git clone https://github.com/MoonshotAI/Kimi-Audio.git
git submodule update --init --recursive

或者
git clone --recursive https://github.com/MoonshotAI/Kimi-Audio.git

安装完依赖后使用python infer.py测试

python 复制代码
import soundfile as sf
# Assuming the KimiAudio class is available after installation
from kimia_infer.api.kimia import KimiAudio
import torch # Ensure torch is imported if needed for device placement


model_path = "/root/xx/models/moonshotai/Kimi-Audio-7B-Instruct" # IMPORTANT: Update this path if loading locally
model = KimiAudio(model_path=model_path, 
                  load_detokenizer=True)
model.to(device)
print("load model from locally!")
# --- 2. Define Sampling Parameters ---
sampling_params = {
    "audio_temperature": 0.8,
    "audio_top_k": 10,
    "text_temperature": 0.0,
    "text_top_k": 5,
    "audio_repetition_penalty": 1.0,
    "audio_repetition_window_size": 64,
    "text_repetition_penalty": 1.0,
    "text_repetition_window_size": 16,
}

# --- 3. Example 1: Audio-to-Text (ASR) ---
# TODO: Provide actual example audio files or URLs accessible to users
# E.g., download sample files first or use URLs
# wget https://path/to/your/asr_example.wav -O asr_example.wav
# wget https://path/to/your/qa_example.wav -O qa_example.wav
asr_audio_path = "test_audios/asr_example.wav" # IMPORTANT: Make sure this file exists
qa_audio_path = "test_audios/qa_example.wav" # IMPORTANT: Make sure this file exists

messages_asr = [
    {"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"},
    {"role": "user", "message_type": "audio", "content": asr_audio_path}
]

# Generate only text output
# Note: Ensure the model object and generate method accept device placement if needed
_, text_output = model.generate(messages_asr, **sampling_params, output_type="text")
print(">>> ASR Output Text: ", text_output)
# Expected output: "这并不是告别,这是一个篇章的结束,也是新篇章的开始。" (Example)

# --- 4. Example 2: Audio-to-Audio/Text Conversation ---
messages_conversation = [
    {"role": "user", "message_type": "audio", "content": qa_audio_path}
]

# Generate both audio and text output
wav_output, text_output = model.generate(messages_conversation, **sampling_params, output_type="both")

# Save the generated audio
output_audio_path = "output_audio.wav"
# Ensure wav_output is on CPU and flattened before saving
sf.write(output_audio_path, wav_output.detach().cpu().view(-1).numpy(), 24000) # Assuming 24kHz output
print(f">>> Conversational Output Audio saved to: {output_audio_path}")
print(">>> Conversational Output Text: ", text_output)
# Expected output: "A." (Example)

print("Kimi-Audio inference examples complete.")
相关推荐
远瞻。3 小时前
【论文阅读】人脸修复(face restoration ) 不同先验代表算法整理2
论文阅读·算法
先做个垃圾出来………6 小时前
哈夫曼树(Huffman Tree)
数据结构·算法
phoenix@Capricornus7 小时前
反向传播算法——矩阵形式递推公式——ReLU传递函数
算法·机器学习·矩阵
Inverse1628 小时前
C语言_动态内存管理
c语言·数据结构·算法
数据与人工智能律师8 小时前
虚拟主播肖像权保护,数字时代的法律博弈
大数据·网络·人工智能·算法·区块链
wuqingshun3141598 小时前
蓝桥杯 16. 外卖店优先级
c++·算法·职场和发展·蓝桥杯·深度优先
YouQian7729 小时前
2025春训第十九场
算法
CodeJourney.9 小时前
基于MATLAB的生物量数据拟合模型研究
人工智能·爬虫·算法·matlab·信息可视化
Epiphany.5569 小时前
素数筛(欧拉筛算法)
c++·算法·图论
爱吃涮毛肚的肥肥(暂时吃不了版)9 小时前
项目班——0510——JSON网络封装
c++·算法·json