Azure OpenAI citations with message correlation

题意:"Azure OpenAI 引用与消息关联"

问题背景:

I am trying out Azure OpenAI with my own data. The data is uploaded to Azure Blob Storage and indexed for use with Azure AI search

"我正在尝试使用自己的数据进行 Azure OpenAI。数据已上传到 Azure Blob 存储并为 Azure AI 搜索进行了索引。"

I do a call to the endpoint in the form of POST {endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version={api-version}, as referenced here.

"我对端点进行了一次调用,形式为 `POST {endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version={api-version}`,就像这里所引用的。"

However, in the response I cannot figure out how the choices[0]['message']['context']['citations'] field correspond to the choices[0]['message']['content'].

"然而,在响应中,我无法弄清楚 `choices[0]['message']['context']['citations']` 字段是如何与 `choices[0]['message']['content']` 对应的。"

For example, I can have a content as something like:

"例如,我的 `content` 可能是这样的:"

cs 复制代码
I have a pear [doc1][doc2]. I have an apple [doc1][doc3].

However, in my citations it looks like:

"然而,在我的 `citations` 中,它看起来像这样:"

cs 复制代码
citations[0].filepath == 'file1.pdf'
citations[1].filepath == 'file2.pdf'
citations[2].filepath == 'file1.pdf'
citations[3].filepath == 'file3.pdf'
citations[4].filepath == 'file4.pdf'

In summary, my question is whether if there is some sort of mapping from doc as shown in the message to the citations.filepath.

"总而言之,我的问题是是否存在一种映射,将消息中显示的 `doc` 与 `citations.filepath` 关联起来。"

问题解决:

Actually, it is not about the length of the citations; it is about how many times the file is referred.

"实际上,这与 `citations` 的长度无关,而是与文件被引用的次数有关。"

If you observe clearly, you can see 'file1.pdf' is referred twice, so mappings will be based on the first appearance and reuse of docs like below:

"如果你仔细观察,可以看到 `file1.pdf` 被引用了两次,因此映射将基于第一次出现和重用文档,如下所示:"

  • doc1 -> citations[0] (file1.pdf).
  • doc2 -> citations[1] (file2.pdf).
  • Reuse of doc1 -> Refers back to the first document (citations[2], file1.pdf).
  • doc3 -> citations[3] (file3.pdf).

Use the code below to get mappings and use it in the content.

"使用下面的代码来获取映射,并在内容中使用它。"

cs 复制代码
import re

def map_citations(content, citations):
    
    pattern = re.compile(r'\[doc(\d+)\]')
    segments = pattern.split(content)
    
    doc_numbers = []
    for segment in segments:
        if segment.isdigit():
            doc_numbers.append(int(segment))
    
    

    doc_to_file_map = {}
    for i, doc_num in enumerate(doc_numbers):
        doc_to_file_map[f'doc{doc_num}'] = citations[i]['filepath']

    print(doc_to_file_map)
    
    def replace_placeholder(match):
        doc_num = match.group(1)
        return f"[{doc_to_file_map[f'doc{doc_num}']}]"
    
    mapped_content = pattern.sub(replace_placeholder, content)
    
    return mapped_content

content = "I have a pear [doc1][doc2]. I have an apple [doc1][doc3]."
citations = [
    {'filepath': 'file1.pdf'},
    {'filepath': 'file2.pdf'},
    {'filepath': 'file1.pdf'},
    {'filepath': 'file3.pdf'},
    {'filepath': 'file4.pdf'}
]

mapped_content = map_citations(content, citations)
print(mapped_content)
相关推荐
机器之心2 小时前
谁说Scaling Law到头了?新研究:每一步的微小提升会带来指数级增长
人工智能·openai
coder_pig5 小时前
🤔 试试 OpenAI 的最强编程模型 "GPT-5-Codex"?
chatgpt·openai·claude
新智元8 小时前
收手吧 GPT-5-Codex,外面全是 AI 编程智能体!
人工智能·openai
安思派Anspire10 小时前
创建完整的评估生命周期以构建高(一)
aigc·openai·agent
机器之心10 小时前
刚刚,OpenAI发布GPT-5-Codex:可独立工作超7小时,还能审查、重构大型项目
人工智能·openai
九章云极AladdinEdu19 小时前
超参数自动化调优指南:Optuna vs. Ray Tune 对比评测
运维·人工智能·深度学习·ai·自动化·gpu算力
CoderJia程序员甲1 天前
GitHub 热榜项目 - 日榜(2025-09-13)
ai·开源·大模型·github·ai教程
新智元1 天前
起猛了!这个国家任命 AI 为「部长」:全球首个,手握实权,招标 100% 透明
人工智能·openai
新智元1 天前
马斯克深夜挥刀,Grok 幕后员工 1/3 失业!谷歌 AI 靠人肉堆起,血汗工厂曝光
人工智能·openai
机器之心1 天前
用光学生成图像,几乎0耗电,浙大校友一作研究登Nature
人工智能·openai