使用 PyAudio、语音识别、pyttsx3 和 SerpApi 构建简单的基于 CLI 的语音助手

德米特里·祖布☀️

一、介绍

正如您从标题中看到的,这是一个演示项目,显示了一个非常基本的语音助手脚本,可以根据 Google 搜索结果在终端中回答您的问题。

您可以在 GitHub 存储库中找到完整代码:dimitryzub/serpapi-demo-projects/speech-recognition/cli-based/

后续博客文章将涉及:

  • 使用Flask、一些 HTML、CSS 和 Javascript 的基于 Web 的解决方案。
  • 使用FlutterDart的基于 Android 和 Windows 的解决方案。

二、我们将在这篇博文中构建什么

2.1 环境准备

首先,让我们确保我们处于不同的环境中,并正确安装项目所需的库。最难(可能)是 安装 .pyaudio,关于此种困难可以参看下文克服:

[解决]修复 win 32/64 位操作系统上的 PyAudio pip 安装错误

2.2 虚拟环境和库安装

在开始安装库之前,我们需要为此项目创建并激活一个新环境:

ba 复制代码
# if you're on Linux based systems
$ python -m venv env && source env/bin/activate
$ (env) <path>

# if you're on Windows and using Bash terminal
$ python -m venv env && source env/Scripts/activate
$ (env) <path>

# if you're on Windows and using CMD
python -m venv env && .\env\Scripts\activate
$ (env) <path>

解释python -m venv env告诉 Python 运行 module( -m)venv并创建一个名为 的文件夹env&&代表"与"。source <venv_name>/bin/activate将激活您的环境,并且您将只能在该环境中安装库。

现在安装所有需要的库:

ba 复制代码
pip install rich pyttsx3 SpeechRecognition google-search-results

现在到pyaudio. 请记住,pyaudio安装时可能会引发错误。您可能需要进行额外的研究。

如果您使用的是 Linux,我们需要安装一些开发依赖项才能使用pyaudio

ba 复制代码
$ sudo apt-get install -y libasound-dev portaudio19-dev
$ pip install pyaudio

如果您使用的是 Windows,则更简单(使用 CMD 和 Git Bash 进行测试):

ba 复制代码
pip install pyaudio

三、完整代码

ba 复制代码
import os
import speech_recognition
import pyttsx3
from serpapi import GoogleSearch
from rich.console import Console
from dotenv import load_dotenv

load_dotenv('.env')
console = Console()

def main():
    console.rule('[bold yellow]SerpApi Voice Assistant Demo Project')
    recognizer = speech_recognition.Recognizer()

    while True:
        with console.status(status='Listening you...', spinner='point') as progress_bar:
            try:
                with speech_recognition.Microphone() as mic:
                    recognizer.adjust_for_ambient_noise(mic, duration=0.1)
                    audio = recognizer.listen(mic)
                    text = recognizer.recognize_google(audio_data=audio).lower()
                    console.print(f'[bold]Recognized text[/bold]: {text}')

                    progress_bar.update(status='Looking for answers...', spinner='line')
                    params = {
                        'api_key': os.getenv('API_KEY'),
                        'device': 'desktop',
                        'engine': 'google',
                        'q': text,
                        'google_domain': 'google.com',
                        'gl': 'us',
                        'hl': 'en'
                    }
                    search = GoogleSearch(params)
                    results = search.get_dict()

                    try:
                        if 'answer_box' in results:
                            try:
                                primary_answer = results['answer_box']['answer']
                            except:
                                primary_answer = results['answer_box']['result']
                            console.print(f'[bold]The answer is[/bold]: {primary_answer}')
                        elif 'knowledge_graph' in results:
                            secondary_answer = results['knowledge_graph']['description']
                            console.print(f'[bold]The answer is[/bold]: {secondary_answer}')
                        else:
                            tertiary_answer = results['answer_box']['list']
                            console.print(f'[bold]The answer is[/bold]: {tertiary_answer}')

                        progress_bar.stop() # if answered is success -> stop progress bar.
                        user_promnt_to_contiune_if_answer_is_success = input('Would you like to to search for something again? (y/n) ')

                        if user_promnt_to_contiune_if_answer_is_success == 'y':
                            recognizer = speech_recognition.Recognizer()
                            continue # run speech recognizion again until `user_promt` == 'n'
                        else:
                            console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                            break
                    except KeyError:
                        progress_bar.stop()
                        error_user_promt = input("Sorry, didn't found the answer. Would you like to rephrase it? (y/n) ")

                        if error_user_promt == 'y':
                            recognizer = speech_recognition.Recognizer()
                            continue # run speech recognizion again until `user_promt` == 'n'
                        else:
                            console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                            break
            except speech_recognition.UnknownValueError:
                progress_bar.stop()
                user_promt_to_continue = input('Sorry, not quite understood you. Could say it again? (y/n) ')

                if user_promt_to_continue == 'y':
                    recognizer = speech_recognition.Recognizer()
                    continue # run speech recognizion again until `user_promt` == 'n'
                else:
                    progress_bar.stop()
                    console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                    break

if __name__ == '__main__':
    main()

四、代码说明

导入库:

ba 复制代码
import os
import speech_recognition
import pyttsx3
from serpapi import GoogleSearch
from rich.console import Console
from dotenv import load_dotenv
  • rich用于在终端中进行漂亮格式化的 Python 库。
  • pyttsx3Python 的文本到语音转换器可离线工作。
  • SpeechRecognition用于将语音转换为文本的 Python 库。
  • google-search-resultsSerpApi 的 Python API 包装器,可解析来自 15 个以上搜索引擎的数据。
  • os读取秘密环境变量。在本例中,它是 SerpApi API 密钥。
  • dotenv从文件加载环境变量(SerpApi API 密钥).env.env文件可以重命名为任何文件:(.napoleon .点)代表环境变量文件。

定义rich Console(). 它将用于美化终端输出(动画等):

ba 复制代码
console = Console()

定义main所有发生的函数:

ba 复制代码
def main():
    console.rule('[bold yellow]SerpApi Voice Assistant Demo Project')
    recognizer = speech_recognition.Recognizer()

在函数的开头,我们定义speech_recognition.Recognizer()并将console.rule创建以下输出:

ba 复制代码
───────────────────────────────────── SerpApi Voice Assistant Demo Project ─────────────────────────────────────

下一步是创建一个 while 循环,该循环将不断监听麦克风输入以识别语音:

ba 复制代码
while True:
    with console.status(status='Listening you...', spinner='point') as progress_bar:
        try:
            with speech_recognition.Microphone() as mic:
                recognizer.adjust_for_ambient_noise(mic, duration=0.1)
                audio = recognizer.listen(mic)

                text = recognizer.recognize_google(audio_data=audio).lower()
                console.print(f'[bold]Recognized text[/bold]: {text}')
  • console.status-rich进度条,仅用于装饰目的。
  • speech_recognition.Microphone()开始从麦克风拾取输入。
  • recognizer.adjust_for_ambient_noise旨在根据环境能量水平校准能量阈值。
  • recognizer.listen监听实际的用户文本。
  • recognizer.recognize_google使用 Google Speech Recongition API 执行语音识别。lower()是降低识别文本。
  • console.print允许使用文本修改的语句rich print,例如添加粗体斜体等。

spinner='point'将产生以下输出(使用python -m rich.spinner查看列表spinners):

之后,我们需要初始化 SerpApi 搜索参数以进行搜索:

ba 复制代码
progress_bar.update(status='Looking for answers...', spinner='line') 
params = {
    'api_key': os.getenv('API_KEY'),  # serpapi api key   
    'device': 'desktop',              # device used for 
    'engine': 'google',               # serpapi parsing engine: https://serpapi.com/status
    'q': text,                        # search query 
    'google_domain': 'google.com',    # google domain:          https://serpapi.com/google-domains
    'gl': 'us',                       # country of the search:  https://serpapi.com/google-countries
    'hl': 'en'                        # language of the search: https://serpapi.com/google-languages
    # other parameters such as locations: https://serpapi.com/locations-api
}
search = GoogleSearch(params)         # where data extraction happens on the SerpApi backend
results = search.get_dict()           # JSON -> Python dict

progress_bar.update将会progress_bar用新的status(控制台中打印的文本)进行更新,spinner='line'并将产生以下动画:

之后,使用 SerpApi 的Google 搜索引擎 API从 Google 搜索中提取数据。

代码的以下部分将执行以下操作:

ba 复制代码
try:
    if 'answer_box' in results:
        try:
            primary_answer = results['answer_box']['answer']
        except:
            primary_answer = results['answer_box']['result']
        console.print(f'[bold]The answer is[/bold]: {primary_answer}')

     elif 'knowledge_graph' in results:
            secondary_answer = results['knowledge_graph']['description']
            console.print(f'[bold]The answer is[/bold]: {secondary_answer}')
        else:
            tertiary_answer = results['answer_box']['list']
            console.print(f'[bold]The answer is[/bold]: {tertiary_answer}')
        progress_bar.stop()  # if answered is success -> stop progress bar

        user_promnt_to_contiune_if_answer_is_success = input('Would you like to to search for something again? (y/n) ')

        if user_promnt_to_contiune_if_answer_is_success == 'y':
            recognizer = speech_recognition.Recognizer()
            continue         # run speech recognizion again until `user_promt` == 'n'
        else:
            console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
            break

    except KeyError:
        progress_bar.stop()  # if didn't found the answer -> stop progress bar
        error_user_promt = input("Sorry, didn't found the answer. Would you like to rephrase it? (y/n) ")

        if error_user_promt == 'y':
            recognizer = speech_recognition.Recognizer()
            continue         # run speech recognizion again until `user_promt` == 'n'
        else:
            console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
            break

最后一步是处理麦克风没有拾取声音时的错误:

ba 复制代码
# while True:
#     with console.status(status='Listening you...', spinner='point') as progress_bar:
#         try:
            # speech recognition code
            # data extraction code
        except speech_recognition.UnknownValueError:
                progress_bar.stop()         # if didn't heard the speech -> stop progress bar
                user_promt_to_continue = input('Sorry, not quite understood you. Could say it again? (y/n) ')

          if user_promt_to_continue == 'y':
              recognizer = speech_recognition.Recognizer()
              continue               # run speech recognizion again until `user_promt` == 'n'
          else:
              progress_bar.stop()    # if want to quit -> stop progress bar
              console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
              break

console.rule()将提供以下输出:

ba 复制代码
───────────────────── Thank you for cheking SerpApi Voice Assistant Demo Project ──────────────────────

添加if __name__ == '__main__'惯用语,以防止用户在无意时意外调用某些脚本,并调用main将运行整个脚本的函数:

ba 复制代码
if __name__ == '__main__':
    main()

五、链接

相关推荐
mwq301231 天前
Transformer : 深度神经网络中的残差连接 (Residual Connection)
人工智能
信田君95271 天前
瑞莎星瑞(Radxa Orion O6) 基于 Android OS 使用 NPU的图片模糊查找APP 开发
android·人工智能·深度学习·神经网络
StarPrayers.1 天前
卷积神经网络(CNN)入门实践及Sequential 容器封装
人工智能·pytorch·神经网络·cnn
周末程序猿1 天前
谈谈上下文工程(Context Engineering)
人工智能
一水鉴天1 天前
整体设计 逻辑系统程序 之29 拼语言+ CNN 框架核心定位、三阶段程序与三种交换模式配套的方案讨论 之2
人工智能·神经网络·cnn
海森大数据1 天前
AI破解数学界遗忘谜题:GPT-5重新发现尘封二十年的埃尔德什问题解法
人工智能·gpt
望获linux1 天前
【实时Linux实战系列】Linux 内核的实时组调度(Real-Time Group Scheduling)
java·linux·服务器·前端·数据库·人工智能·深度学习
Dev7z1 天前
河南特色农产品识别系统:让AI守护“中原味道”
人工智能
万俟淋曦1 天前
【论文速递】2025年第28周(Jul-06-12)(Robotics/Embodied AI/LLM)
人工智能·ai·机器人·大模型·论文·robotics·具身智能