Kimi 7B 语音转文字

1. 模型准备

js 复制代码
from modelscope import snapshot_download
model_dir = snapshot_download('moonshotai/Kimi-Audio-7B-Instruct',
                              cache_dir="./models")

2. 安装和初步推理

clone仓库需要clone整个子模块

js 复制代码
git clone https://github.com/MoonshotAI/Kimi-Audio.git
git submodule update --init --recursive

或者
git clone --recursive https://github.com/MoonshotAI/Kimi-Audio.git

安装完依赖后使用python infer.py测试

python 复制代码
import soundfile as sf
# Assuming the KimiAudio class is available after installation
from kimia_infer.api.kimia import KimiAudio
import torch # Ensure torch is imported if needed for device placement


model_path = "/root/xx/models/moonshotai/Kimi-Audio-7B-Instruct" # IMPORTANT: Update this path if loading locally
model = KimiAudio(model_path=model_path, 
                  load_detokenizer=True)
model.to(device)
print("load model from locally!")
# --- 2. Define Sampling Parameters ---
sampling_params = {
    "audio_temperature": 0.8,
    "audio_top_k": 10,
    "text_temperature": 0.0,
    "text_top_k": 5,
    "audio_repetition_penalty": 1.0,
    "audio_repetition_window_size": 64,
    "text_repetition_penalty": 1.0,
    "text_repetition_window_size": 16,
}

# --- 3. Example 1: Audio-to-Text (ASR) ---
# TODO: Provide actual example audio files or URLs accessible to users
# E.g., download sample files first or use URLs
# wget https://path/to/your/asr_example.wav -O asr_example.wav
# wget https://path/to/your/qa_example.wav -O qa_example.wav
asr_audio_path = "test_audios/asr_example.wav" # IMPORTANT: Make sure this file exists
qa_audio_path = "test_audios/qa_example.wav" # IMPORTANT: Make sure this file exists

messages_asr = [
    {"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"},
    {"role": "user", "message_type": "audio", "content": asr_audio_path}
]

# Generate only text output
# Note: Ensure the model object and generate method accept device placement if needed
_, text_output = model.generate(messages_asr, **sampling_params, output_type="text")
print(">>> ASR Output Text: ", text_output)
# Expected output: "这并不是告别,这是一个篇章的结束,也是新篇章的开始。" (Example)

# --- 4. Example 2: Audio-to-Audio/Text Conversation ---
messages_conversation = [
    {"role": "user", "message_type": "audio", "content": qa_audio_path}
]

# Generate both audio and text output
wav_output, text_output = model.generate(messages_conversation, **sampling_params, output_type="both")

# Save the generated audio
output_audio_path = "output_audio.wav"
# Ensure wav_output is on CPU and flattened before saving
sf.write(output_audio_path, wav_output.detach().cpu().view(-1).numpy(), 24000) # Assuming 24kHz output
print(f">>> Conversational Output Audio saved to: {output_audio_path}")
print(">>> Conversational Output Text: ", text_output)
# Expected output: "A." (Example)

print("Kimi-Audio inference examples complete.")
相关推荐
CoovallyAIHub5 分钟前
方案 | 动车底部零部件检测实时流水线检测算法改进
深度学习·算法·计算机视觉
CoovallyAIHub7 分钟前
方案 | 光伏清洁机器人系统详细技术实施方案
深度学习·算法·计算机视觉
lxmyzzs11 分钟前
【图像算法 - 14】精准识别路面墙体裂缝:基于YOLO12与OpenCV的实例分割智能检测实战(附完整代码)
人工智能·opencv·算法·计算机视觉·裂缝检测·yolo12
洋曼巴-young12 分钟前
240. 搜索二维矩阵 II
数据结构·算法·矩阵
楼田莉子1 小时前
C++算法题目分享:二叉搜索树相关的习题
数据结构·c++·学习·算法·leetcode·面试
pusue_the_sun2 小时前
数据结构——栈和队列oj练习
c语言·数据结构·算法··队列
大锦终2 小时前
【算法】模拟专题
c++·算法
Xの哲學3 小时前
Perf使用详解
linux·网络·网络协议·算法·架构
想不明白的过度思考者3 小时前
数据结构(排序篇)——七大排序算法奇幻之旅:从扑克牌到百亿数据的魔法整理术
数据结构·算法·排序算法
小七rrrrr3 小时前
动态规划法 - 53. 最大子数组和
java·算法·动态规划