Azure OpenAI citations with message correlation

题意:"Azure OpenAI 引用与消息关联"

问题背景:

I am trying out Azure OpenAI with my own data. The data is uploaded to Azure Blob Storage and indexed for use with Azure AI search

"我正在尝试使用自己的数据进行 Azure OpenAI。数据已上传到 Azure Blob 存储并为 Azure AI 搜索进行了索引。"

I do a call to the endpoint in the form of POST {endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version={api-version}, as referenced here.

"我对端点进行了一次调用,形式为 `POST {endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version={api-version}`,就像这里所引用的。"

However, in the response I cannot figure out how the choices[0]['message']['context']['citations'] field correspond to the choices[0]['message']['content'].

"然而,在响应中,我无法弄清楚 `choices[0]['message']['context']['citations']` 字段是如何与 `choices[0]['message']['content']` 对应的。"

For example, I can have a content as something like:

"例如,我的 `content` 可能是这样的:"

cs 复制代码
I have a pear [doc1][doc2]. I have an apple [doc1][doc3].

However, in my citations it looks like:

"然而,在我的 `citations` 中,它看起来像这样:"

cs 复制代码
citations[0].filepath == 'file1.pdf'
citations[1].filepath == 'file2.pdf'
citations[2].filepath == 'file1.pdf'
citations[3].filepath == 'file3.pdf'
citations[4].filepath == 'file4.pdf'

In summary, my question is whether if there is some sort of mapping from doc as shown in the message to the citations.filepath.

"总而言之,我的问题是是否存在一种映射,将消息中显示的 `doc` 与 `citations.filepath` 关联起来。"

问题解决:

Actually, it is not about the length of the citations; it is about how many times the file is referred.

"实际上,这与 `citations` 的长度无关,而是与文件被引用的次数有关。"

If you observe clearly, you can see 'file1.pdf' is referred twice, so mappings will be based on the first appearance and reuse of docs like below:

"如果你仔细观察,可以看到 `file1.pdf` 被引用了两次,因此映射将基于第一次出现和重用文档,如下所示:"

  • doc1 -> citations[0] (file1.pdf).
  • doc2 -> citations[1] (file2.pdf).
  • Reuse of doc1 -> Refers back to the first document (citations[2], file1.pdf).
  • doc3 -> citations[3] (file3.pdf).

Use the code below to get mappings and use it in the content.

"使用下面的代码来获取映射,并在内容中使用它。"

cs 复制代码
import re

def map_citations(content, citations):
    
    pattern = re.compile(r'\[doc(\d+)\]')
    segments = pattern.split(content)
    
    doc_numbers = []
    for segment in segments:
        if segment.isdigit():
            doc_numbers.append(int(segment))
    
    

    doc_to_file_map = {}
    for i, doc_num in enumerate(doc_numbers):
        doc_to_file_map[f'doc{doc_num}'] = citations[i]['filepath']

    print(doc_to_file_map)
    
    def replace_placeholder(match):
        doc_num = match.group(1)
        return f"[{doc_to_file_map[f'doc{doc_num}']}]"
    
    mapped_content = pattern.sub(replace_placeholder, content)
    
    return mapped_content

content = "I have a pear [doc1][doc2]. I have an apple [doc1][doc3]."
citations = [
    {'filepath': 'file1.pdf'},
    {'filepath': 'file2.pdf'},
    {'filepath': 'file1.pdf'},
    {'filepath': 'file3.pdf'},
    {'filepath': 'file4.pdf'}
]

mapped_content = map_citations(content, citations)
print(mapped_content)
相关推荐
xcLeigh24 分钟前
计算机视觉图像处理基础系列:滤波、边缘检测与形态学操作
图像处理·人工智能·计算机视觉·ai
weixin_457885821 小时前
虎跃办公AI赋能的实时协同开发范式与神经符号系统突破
人工智能·搜索引擎·ai·deepseek
終不似少年遊*3 小时前
操作系统、虚拟化技术与云原生及云原生AI简述
docker·ai·云原生·容器·华为云·云计算·k8s
带你去吃小豆花4 小时前
在亚马逊云科技上使用n8n快速构建个人AI NEWS助理
人工智能·科技·ai·云原生·aws
顽疲14 小时前
从零用java实现 小红书 springboot vue uniapp (11)集成AI聊天机器人
java·vue.js·spring boot·ai
XINVRY-FPGA17 小时前
Xilinx FPGA XCVC1902-2MSEVSVA2197 Versal AI Core系列芯片的详细介绍
人工智能·嵌入式硬件·5g·ai·fpga开发·云计算·fpga
csdn_aspnet18 小时前
使用 .NET 9 和 Azure 构建云原生应用程序:有什么新功能?
microsoft·云原生·azure
Elastic 中国社区官方博客20 小时前
Elasticsearch:使用机器学习生成筛选器和分类标签
大数据·人工智能·elasticsearch·机器学习·搜索引擎·ai·分类
阿杜杜不是阿木木20 小时前
使用ollama部署本地大模型(没有GPU也可以),实现IDEA和VS Code的git commit自动生成
linux·git·vscode·ai·intellij-idea·ollama
getyefang2 天前
uniapp如何接入星火大模型
ai·uni-app