背景
OpenAI自从23年11月引入后就没有更新过,一直是0.27.0
,近期由于项目需要想引入Agently框架,此框架依赖的OpenAI库较新,于是就需要升级OpenAI库。
新升级的OpenAI库版本为:
错误引出
升级完运行下,首先遇到了这个错误404 - {'error': {'code': '404', 'message': 'Resource not found'}}
,堆栈如下:
错误分析1
可以看出,在使用OpenAIEmbedding在进行向量嵌入时报的错,我们使用的是Azure部署的OpenAI服务,出现此错误一般是由于Azure Endpoint,Deployment,API Version,API Key,Model Version这几项信息中的某一项有问题。断点调试OpenAIEmbedding的URL如下:
https://openaiquanshi.openai.azure.com//deployments/meeting-embedding/embeddings?api-version=2023-07-01-preview
直接用curl向这个URL发送请求,确实报了404:
~ % curl -i -X POST -H 'Content-Length:100' 'https://openaiquanshi.openai.azure.com//deployments/meeting-embedding/embeddings?api-version=2023-07-01-preview'
HTTP/2 404
content-length: 56
content-type: application/json
apim-request-id: 61063a5d-bb28-4299-a267-a8ddb8de870c
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
date: Wed, 17 Jul 2024 01:42:52 GMT
{"error":{"code":"404","message": "Resource not found"}}%
**真实原因:**openai接口的URL Path存在版本兼容性,v1版本需要在域名后多加一个/openai用于区分azure和openai(如:https://xxx.openai.azure.com/openai
),并且这个URL的修改openai库内部会自动完成。但是openai库内部有个逻辑:只有openai_api_base中没有/openai
这个串时才会添加。代码如下所示:
python
if is_openai_v1():
# For backwards compatibility. Before openai v1, no distinction was made
# between azure_endpoint and base_url (openai_api_base).
openai_api_base = values["openai_api_base"]
if openai_api_base and values["validate_base_url"]:
if "/openai" not in openai_api_base:
values["openai_api_base"] += "/openai"
warnings.warn(
"As of openai>=1.0.0, Azure endpoints should be specified via "
f"the `azure_endpoint` param not `openai_api_base` "
f"(or alias `base_url`). Updating `openai_api_base` from "
f"{openai_api_base} to {values['openai_api_base']}."
)
这里有个漏洞,由于azure的openai_api_base域名是由客户企业自己命名的,如果客户命名的域名以openai开头(如:https://openaixxx.openai.azure.com
),openai库内部的URL拼接逻辑将不会执行到,最终v1+版本的openai库发出的请求却没有在域名后携带/openai
, 导致404 Resource Not found
。
**解决办法:**模仿上面的方法,写一个函数is_openai_v1提前进行openai库的版本判断和URL修正。
python
def is_openai_v1() -> bool:
from importlib.metadata import version
from packaging.version import Version, parse
_version = parse(version("openai"))
return _version >= Version("1.0.0")
api_base = os.environ["OPENAI_API_BASE"]
if is_openai_v1():
api_base = os.path.join(api_base, "openai")
print(api_base)
输出的api_base如下:
python
https://openaiquanshi.openai.azure.com/openai/deployments/meeting-embedding
错误分析2
api_base已经正确,但运行项目还是报404, 同时弹出了如下警告:
python
=============================== warnings summary ===============================
/Users/a200007/work/ucgit/ucserver/summaryserver/venv/lib/python3.10/site-packages/langchain/embeddings/openai.py:319: UserWarning: If you have openai>=1.0.0 installed and are using Azure, please use the `AzureOpenAIEmbeddings` class.
warnings.warn(
按照上面提示,对于azure,当openai库大于1.0.0时,我们需要使用AzureOpenAIEmbeddings代替OpenAIEmbeddings
python
from langchain.embeddings.azure_openai import AzureOpenAIEmbeddings
api_base = os.environ["OPENAI_API_BASE"]
if is_openai_v1():
api_base = os.path.join(api_base, "openai")
embedding_function = AzureOpenAIEmbeddings(
deployment=embedding_deployment,
chunk_size=embedding_max_inputs,
openai_api_base = api_base
使用此embedding_function做测试:
python
tensor = embedding.embed_query("今天你去哪儿了?")
print(tensor)
> [0.0018476681179545632, -0.003048808135045326, 0.013323151742329006, -0.012344790841118551, -0.027643358824239652, 0.015491746640257776, -0.02547476299498829, -0.0032466613330973887, -0.03310223727411042, -0.017236592553203055, 0.012407106567011756, -0.005380982628351025, 0.00047866445067163386, ......(1535)]
错误分析3
继续运行,又报了404,这次是一个新的位置:
前面是embeddings的错误,这次是langchain中chat_models的错误,猜想应该都是URL兼容性的问题,于是在构造llm实例时,也加了openai版本的判断和url的处理:
python
api_base = os.environ["OPENAI_API_BASE"]
if is_openai_v1():
api_base = os.path.join(api_base, "openai")
llm = AzureChatOpenAI(
temperature=0,
max_tokens=4096,
......
openai_api_base=api_base
)
这样修改后终于运行正常。