Kimi 7B 语音转文字

1. 模型准备

js 复制代码
from modelscope import snapshot_download
model_dir = snapshot_download('moonshotai/Kimi-Audio-7B-Instruct',
                              cache_dir="./models")

2. 安装和初步推理

clone仓库需要clone整个子模块

js 复制代码
git clone https://github.com/MoonshotAI/Kimi-Audio.git
git submodule update --init --recursive

或者
git clone --recursive https://github.com/MoonshotAI/Kimi-Audio.git

安装完依赖后使用python infer.py测试

python 复制代码
import soundfile as sf
# Assuming the KimiAudio class is available after installation
from kimia_infer.api.kimia import KimiAudio
import torch # Ensure torch is imported if needed for device placement


model_path = "/root/xx/models/moonshotai/Kimi-Audio-7B-Instruct" # IMPORTANT: Update this path if loading locally
model = KimiAudio(model_path=model_path, 
                  load_detokenizer=True)
model.to(device)
print("load model from locally!")
# --- 2. Define Sampling Parameters ---
sampling_params = {
    "audio_temperature": 0.8,
    "audio_top_k": 10,
    "text_temperature": 0.0,
    "text_top_k": 5,
    "audio_repetition_penalty": 1.0,
    "audio_repetition_window_size": 64,
    "text_repetition_penalty": 1.0,
    "text_repetition_window_size": 16,
}

# --- 3. Example 1: Audio-to-Text (ASR) ---
# TODO: Provide actual example audio files or URLs accessible to users
# E.g., download sample files first or use URLs
# wget https://path/to/your/asr_example.wav -O asr_example.wav
# wget https://path/to/your/qa_example.wav -O qa_example.wav
asr_audio_path = "test_audios/asr_example.wav" # IMPORTANT: Make sure this file exists
qa_audio_path = "test_audios/qa_example.wav" # IMPORTANT: Make sure this file exists

messages_asr = [
    {"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"},
    {"role": "user", "message_type": "audio", "content": asr_audio_path}
]

# Generate only text output
# Note: Ensure the model object and generate method accept device placement if needed
_, text_output = model.generate(messages_asr, **sampling_params, output_type="text")
print(">>> ASR Output Text: ", text_output)
# Expected output: "这并不是告别,这是一个篇章的结束,也是新篇章的开始。" (Example)

# --- 4. Example 2: Audio-to-Audio/Text Conversation ---
messages_conversation = [
    {"role": "user", "message_type": "audio", "content": qa_audio_path}
]

# Generate both audio and text output
wav_output, text_output = model.generate(messages_conversation, **sampling_params, output_type="both")

# Save the generated audio
output_audio_path = "output_audio.wav"
# Ensure wav_output is on CPU and flattened before saving
sf.write(output_audio_path, wav_output.detach().cpu().view(-1).numpy(), 24000) # Assuming 24kHz output
print(f">>> Conversational Output Audio saved to: {output_audio_path}")
print(">>> Conversational Output Text: ", text_output)
# Expected output: "A." (Example)

print("Kimi-Audio inference examples complete.")
相关推荐
知识搬运工人几秒前
对比 DeepSeek(MLA)、Qwen 和 Llama 系列大模型在 Attention 架构/算法层面的核心设计及理解它们的本质区别。
算法
修行者Java41 分钟前
JVM 垃圾回收算法的详细介绍
jvm·算法
AndrewHZ44 分钟前
【图像处理基石】什么是光流法?
图像处理·算法·计算机视觉·目标跟踪·cv·光流法·行为识别
mjhcsp2 小时前
C++ 三分查找:在单调与凸函数中高效定位极值的算法
开发语言·c++·算法
立志成为大牛的小牛2 小时前
数据结构——四十二、二叉排序树(王道408)
数据结构·笔记·程序人生·考研·算法
Funny_AI_LAB4 小时前
李飞飞联合杨立昆发表最新论文:超感知AI模型从视频中“看懂”并“预见”三维世界
人工智能·算法·语言模型·音视频
RTC老炮7 小时前
webrtc降噪-PriorSignalModelEstimator类源码分析与算法原理
算法·webrtc
草莓火锅9 小时前
用c++使输入的数字各个位上数字反转得到一个新数
开发语言·c++·算法
散峰而望9 小时前
C/C++输入输出初级(一) (算法竞赛)
c语言·开发语言·c++·算法·github
Kuo-Teng9 小时前
LeetCode 160: Intersection of Two Linked Lists
java·算法·leetcode·职场和发展