faster_whisper语音识别

faster_whisper语音识别

检测可用设备:list_available_devices()函数

我这边usb摄像头带麦克风的,所以 DEV_index = 8

1 使用 pyaudio 打开音频设备

2 从音频设备读取数据,传递给 faster_whisper 识别

按键 r 录制 s 停止 q退出

test.py

python 复制代码
# from faster_whisper import WhisperModel

# model = WhisperModel("large-v3")

# audio_path= "mlk.flac"

# segments, info = model.transcribe(audio_path)

# for segment in segments:
#         print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))



from faster_whisper import WhisperModel
import numpy as np
import keyboard
import pynput
from pynput.keyboard import Controller, Listener,Key,KeyCode
import time
import pyaudio
import wave

def list_available_devices():
    print("Available input devices:")
    p = pyaudio.PyAudio()
    for i in range(p.get_device_count()):
        device_info = p.get_device_info_by_index(i)
        if device_info['maxInputChannels'] > 0:  # Check if it's an input device
            print(f"{i}: {device_info['name']}")
    p.terminate()


# List available devices
list_available_devices()


# Available input devices:
# 5: USB Audio: #1 (hw:2,1)
# 6: USB Audio: #2 (hw:2,2)
# 8: aoni webcam A20: USB Audio (hw:3,0)
# 9: pulse
# 10: default

# Replace with the device index you identified by run list_available_devices()
DEV_index = 8  # Replace with your actual device index


class VoiceRecorder:
    def __init__(self, channels=1, rate=16000, format=pyaudio.paInt16):
        self.p = pyaudio.PyAudio()
        self.model = WhisperModel("large-v3")
        self.CHANNELS = channels
        self.RATE = rate
        self.FORMAT = format

    def record(self, seconds=5):
        """
        记录指定秒数的音频。
        """
        CHUNK = 1024
        
        try:
            stream = self.p.open(format=self.FORMAT,
                                channels=self.CHANNELS,
                                rate=self.RATE,
                                input=True,
                                input_device_index=DEV_index,
                                frames_per_buffer=CHUNK)
            print("开始录音...")
            frames = []
            
            for i in range(0, int(self.RATE / CHUNK * seconds)):
                data = stream.read(CHUNK)
                frames.append(data)
                
            print("录音结束.")
        except Exception as e:
            print(f"录音时发生错误:{e}")
            return None
        finally:
            stream.stop_stream()
            stream.close()
        
        return b''.join(frames)

    def transcribe_audio(self, audio_data):
        """
        将音频数据转换为文本。
        """
        try:
            audio_np = np.frombuffer(audio_data, dtype=np.int16)
            if self.CHANNELS > 1:
                audio_np = audio_np.reshape((-1, self.CHANNELS)).mean(axis=1)
            audio_normalized = np.float32(audio_np) / 32768.0
            
            segments, _ = self.model.transcribe(audio_normalized, language='zh', beam_size=5)
            return [segment.text for segment in segments]
        except Exception as e:
            print(f"转录音频时发生错误:{e}")
            return None

    def close(self):
        """
        关闭PyAudio。
        """
        self.p.terminate()

def main():
    global recorder
    global listener
    recorder = VoiceRecorder()

    listener = Listener(
        on_press=on_press
    )
    listener.start()
    listener.join()


def on_press(key:KeyCode):
    print(type(key))
    if key.char == 'r':
        print("开始录音...")
        audio_data = recorder.record()
        if audio_data is not None:
            transcripts = recorder.transcribe_audio(audio_data)
            for text in transcripts:
                print(text)
            print("录音结束.")
    elif key.char == 's':
        print("停止录音.")
    elif key.char == 'q':
        print("退出程序.")
        listener.stop()
        recorder.close()

if __name__ == "__main__":
    main()
相关推荐
未来之窗软件服务4 天前
AI人工智能(二十三)错误示范ASR 语音识别C#—东方仙盟练气期
人工智能·c#·语音识别·仙盟创梦ide·东方仙盟
山河君4 天前
四麦克风声源定位实战:基于 GCC-PHAT + 最小二乘法实现 DOA
算法·音视频·语音识别·信号处理·最小二乘法·tdoa
colicode5 天前
安卓Android语音验证码接口API示例代码:Kotlin/Java版App验证开发
android·java·前端·前端框架·kotlin·语音识别
EasyDSS5 天前
从“听见”到“理解”:EasyDSS视频会议系统智能字幕、语音转写技术的深度剖析
音视频·语音识别·语音转写·ai摘要·点播技术·流媒体直播·智能字幕
开开心心就好7 天前
文字转语音无字数限,对接微软接口比付费爽
java·linux·开发语言·人工智能·pdf·语音识别
OBS插件网7 天前
OBS弹幕助手使用教程:OBS语音读弹幕语音播报插件下载安装教程
人工智能·语音识别
阿林来了8 天前
Flutter三方库适配OpenHarmony【flutter_speech】— 持续语音识别与长录音
flutter·语音识别·harmonyos
开开心心就好8 天前
免费音频转文字工具,绿色版离线多模型可用
人工智能·windows·计算机视觉·计算机外设·ocr·excel·语音识别
OBS插件网12 天前
OBS直播如何给人脸加口罩特效?OBS口罩特效插件下载安装教程
人工智能·数码相机·语音识别·产品经理
阿林来了13 天前
Flutter三方库适配OpenHarmony【flutter_speech】— 语音识别停止与取消
flutter·语音识别·harmonyos