Kimi 7B 语音转文字

1. 模型准备

js 复制代码
from modelscope import snapshot_download
model_dir = snapshot_download('moonshotai/Kimi-Audio-7B-Instruct',
                              cache_dir="./models")

2. 安装和初步推理

clone仓库需要clone整个子模块

js 复制代码
git clone https://github.com/MoonshotAI/Kimi-Audio.git
git submodule update --init --recursive

或者
git clone --recursive https://github.com/MoonshotAI/Kimi-Audio.git

安装完依赖后使用python infer.py测试

python 复制代码
import soundfile as sf
# Assuming the KimiAudio class is available after installation
from kimia_infer.api.kimia import KimiAudio
import torch # Ensure torch is imported if needed for device placement


model_path = "/root/xx/models/moonshotai/Kimi-Audio-7B-Instruct" # IMPORTANT: Update this path if loading locally
model = KimiAudio(model_path=model_path, 
                  load_detokenizer=True)
model.to(device)
print("load model from locally!")
# --- 2. Define Sampling Parameters ---
sampling_params = {
    "audio_temperature": 0.8,
    "audio_top_k": 10,
    "text_temperature": 0.0,
    "text_top_k": 5,
    "audio_repetition_penalty": 1.0,
    "audio_repetition_window_size": 64,
    "text_repetition_penalty": 1.0,
    "text_repetition_window_size": 16,
}

# --- 3. Example 1: Audio-to-Text (ASR) ---
# TODO: Provide actual example audio files or URLs accessible to users
# E.g., download sample files first or use URLs
# wget https://path/to/your/asr_example.wav -O asr_example.wav
# wget https://path/to/your/qa_example.wav -O qa_example.wav
asr_audio_path = "test_audios/asr_example.wav" # IMPORTANT: Make sure this file exists
qa_audio_path = "test_audios/qa_example.wav" # IMPORTANT: Make sure this file exists

messages_asr = [
    {"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"},
    {"role": "user", "message_type": "audio", "content": asr_audio_path}
]

# Generate only text output
# Note: Ensure the model object and generate method accept device placement if needed
_, text_output = model.generate(messages_asr, **sampling_params, output_type="text")
print(">>> ASR Output Text: ", text_output)
# Expected output: "这并不是告别,这是一个篇章的结束,也是新篇章的开始。" (Example)

# --- 4. Example 2: Audio-to-Audio/Text Conversation ---
messages_conversation = [
    {"role": "user", "message_type": "audio", "content": qa_audio_path}
]

# Generate both audio and text output
wav_output, text_output = model.generate(messages_conversation, **sampling_params, output_type="both")

# Save the generated audio
output_audio_path = "output_audio.wav"
# Ensure wav_output is on CPU and flattened before saving
sf.write(output_audio_path, wav_output.detach().cpu().view(-1).numpy(), 24000) # Assuming 24kHz output
print(f">>> Conversational Output Audio saved to: {output_audio_path}")
print(">>> Conversational Output Text: ", text_output)
# Expected output: "A." (Example)

print("Kimi-Audio inference examples complete.")
相关推荐
jerry60923 分钟前
c++流对象
开发语言·c++·算法
2301_817031651 小时前
C语言-- 深入理解指针(4)
c语言·开发语言·算法
·醉挽清风·2 小时前
学习笔记—双指针算法—移动零
c++·笔记·学习·算法
几点才到啊2 小时前
使用 malloc 函数模拟开辟一个 3x5 的整型二维数组
数据结构·算法
编程绿豆侠2 小时前
力扣HOT100之链表:23. 合并 K 个升序链表
算法·leetcode·链表
Ayanami_Reii2 小时前
Leetcode837.新21点
c++·笔记·算法
我想进大厂2 小时前
图论---最大流(Dinic)
算法·深度优先·图论
brzhang2 小时前
效率神器!TmuxAI:一款无痕融入终端的AI助手,让我的开发体验翻倍提升
前端·后端·算法
songx_995 小时前
算法设计与分析7(贪心算法)
算法