不要神话AI了，一个日常问题让AI全军覆没

事情起源

我在体验 Pollinations API 时遇到一个问题，官方文档提供了 Text-to-Speech 的使用示例，对于 Speech-to-Text 确一笔带过"语音转文本功能也可以通过 openai-audio 模型获得"，随即贴了一个 OpenAI Audio Guide 链接。一时毫无头绪的我就将相关文档链接贴给了AI，让AI帮我提供一个示例，提示词如下：

vbnet 复制代码

https://github.com/pollinations/pollinations/blob/master/APIDOCS.md
https://platform.openai.com/docs/guides/speech-to-text
===
这是pollinations api文档，告诉我如何使用pollinations api完成Speech-to-Text并提供示例代码

随之竟难住了一众AI模型🤣

模型表现

DeepSeek

首先将提示词给到官方DeepSeek

官方DeepSeek给出的示例完全不可用❌

Chat GPT

python 复制代码

import requests
def transcribe_audio(file_path):
    # Pollinations 的 OpenAI 兼容端点，用于音频处理（包括语音转文字）
    url = "https://text.pollinations.ai/openai"
    
    # 准备 multipart/form-data 请求
    files = {
        "file": open(file_path, "rb")
    }
    data = {
        "model": "openai-audio"  # 指定使用 openai-audio 模型
    }
    
    response = requests.post(url, data=data, files=files)
    
    # 检查请求是否成功
    if response.status_code == 200:
        # 返回的响应通常为 JSON 格式，包含转录后的文字
        return response.json()
    else:
        raise Exception(f"请求失败: {response.status_code}, {response.text}")
if __name__ == "__main__":
    audio_file = "./Welcome to Pollinations.mp3"  # 替换为你的音频文件路径
    try:
        result = transcribe_audio(audio_file)
        print("转录结果：", result)
    except Exception as e:
        print("错误：", e)

ChatGPT给出了看似不错的代码，运行后发现虽然完成了正确请求，但是无法进行语音识别❌

Kimi

python 复制代码

import requests
import json
def transcribe_audio(file_path):
    # 打开音频文件
    with open(file_path, "rb") as audio_file:
        # 构造 messages 参数
        messages = [
            {"role": "user", "content": "Transcribe this audio"}
        ]
        
        # 发送 POST 请求到 Pollinations API
        url = "https://text.pollinations.ai/"
        response = requests.post(
            url,
            data={
                "model": "openai-audio",
                "messages": json.dumps(messages)  # 使用 json.dumps 确保格式正确
            },
            files={"file": audio_file}
        )
    
    # 检查响应是否成功
    if response.status_code == 200:
        transcription = response.json()
        print("Transcription:", transcription.get("text"))
    else:
        print("Failed to transcribe audio. Status code:", response.status_code)
        print("Response:", response.text)
# 示例调用
transcribe_audio("./Welcome to Pollinations.mp3")

Kimi给出的代码执行报错，参数格式有问题，经过多轮处理还是提供了错误的代码❌

Mistral

python 复制代码

import requests
def speech_to_text(audio_file_path):
    url = "https://text.pollinations.ai/openai"
    headers = {
        "Content-Type": "application/json"
    }
    data = {
        "model": "openai-audio",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Transcribe the following audio:"
                    },
                    {
                        "type": "audio_url",
                        "audio_url": {
                            "url": audio_file_path
                        }
                    }
                ]
            }
        ]
    }
    response = requests.post(url, headers=headers, json=data)
    return response.json()
# Example usage
audio_url = "https://www.runoob.com/try/demo_source/horse.mp3"
transcription = speech_to_text(audio_url)
print(transcription)

audio参数类型传递问题，此处需要使用base64格式的data❌

Claude 3.7 Sonnet

最后看编程界大哥Claude 3.7 Sonnet表现

claude给出的代码从api部分就出错了，后面也不用看了❌

可运行代码

最后给出完整可执行的代码

css 复制代码

import base64
import requests
def analyze_image(audio_file_path):
    with open(audio_file_path, "rb") as audio_file:
        audio_data = audio_file.read()
        audio_base64 = base64.b64encode(audio_data).decode("utf-8")
        response = requests.post('https://text.pollinations.ai/openai', json={
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": "What's in this audio?"},
                        {
                            "type": "input_audio",
                            "input_audio": {
                                "data": audio_base64,
                                "format": "mp3"
                            }
                        }
                    ]
                }
            ],
            "model": "openai-audio"
        })
        return response.json()
# Example usage
result = analyze_image("./Welcome to Pollinations.mp3")
print(result['choices'][0]['message']['content'])

总结

使用了5款AI工具来完成这个日常编程问题，表现真是一言难尽：

热度最高的DeepSeek和Claude 3.7 Sonnet给出的代码完全不可用，连API的配置都是错误的；
Chat GPT和Kimi给出了正确的API和模型配置，但是传参方式错的很多；
令人意外的倒是Mistral，给出的代码可用率可达70%，除了音频的传参类型不对，其他的代码都是准确可用的。

所以日常生活中我们还是要正确面对AI，AI很强大，它的横向知识领域很宽，但不是万能的，很多问题的处理上仍有进步空间。

友情提示

见原文：不要神话AI了，一个日常问题让AI全军覆没

本文同步自微信公众号 "程序员小溪" ，这里只是同步，想看及时消息请移步我的公众号，不定时更新我的学习经验。