【LangChain】概要(Summarization)

LangChain学习文档


概要

summarization chain可用于汇总多个文档。一种方法是输入多个较小的文档,将它们分为块后,并使用 MapReduceDocumentsChain 对它们进行操作。还可以选择将进行汇总的链改为 StuffDocumentsChainRefineDocumentsChain

准备数据

首先我们准备数据。在此示例中,我们从一个长文档创建多个文档,但可以以任何方式获取这些文档(本笔记本的重点是突出显示获取文档后要执行的操作)。

python 复制代码
from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
# 大模型
llm = OpenAI(temperature=0)
# 初始化拆分器
text_splitter = CharacterTextSplitter()
# 加载长文本
with open("../../state_of_the_union.txt") as f:
    state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)

from langchain.docstore.document import Document
# 将拆分后的文本转成文档
docs = [Document(page_content=t) for t in texts[:3]]

快速开始

如果您只是想尽快开始,建议采用以下方法:

python 复制代码
from langchain.chains.summarize import load_summarize_chain
# 注意这里是load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)

结果:

python 复制代码
'问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。'

如果您想更好地控制和了解正在发生的事情,请参阅以下信息。

stuff Chain

本节展示使用 stuff Chain 进行汇总的结果。

python 复制代码
chain = load_summarize_chain(llm, chain_type="stuff")
chain.run(docs)

结果:

python 复制代码
    ' 问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。'

自定义提示(Custom Prompts

您还可以在该链上使用您自己的提示。在此示例中,我们将用意大利语回复。

python 复制代码
prompt_template = """Write a concise summary of the following:

{text}

CONCISE SUMMARY IN ITALIAN:"""
# 上面的prompt是要用意大利语做摘要
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
# summarize_chain
chain = load_summarize_chain(llm, chain_type="stuff", prompt=PROMPT)
chain.run(docs)

结果:就不打印了。

map_reduce Chain

本节展示使用map_reduce Chain进行汇总的结果。

python 复制代码
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)
python 复制代码
    ' 问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。'

中间步骤

如果我们想检查它们,我们还可以返回 map_reduce 链的中间步骤。这是通过 return_map_steps 变量完成的。

python 复制代码
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True)

chain({"input_documents": docs}, return_only_outputs=True)

结果:

python 复制代码
    {'map_steps': [" In response to Russia's aggression in Ukraine, the United States has united with other freedom-loving nations to impose economic sanctions and hold Putin accountable. The U.S. Department of Justice is also assembling a task force to go after the crimes of Russian oligarchs and seize their ill-gotten gains.",
      ' The United States and its European allies are taking action to punish Russia for its invasion of Ukraine, including seizing assets, closing off airspace, and providing economic and military assistance to Ukraine. The US is also mobilizing forces to protect NATO countries and has released 30 million barrels of oil from its Strategic Petroleum Reserve to help blunt gas prices. The world is uniting in support of Ukraine and democracy, and the US stands with its Ukrainian-American citizens.',
      " President Biden and Vice President Harris ran for office with a new economic vision for America, and have since passed the American Rescue Plan and the Bipartisan Infrastructure Law to help struggling families and rebuild America's infrastructure. This includes creating jobs, modernizing roads, airports, ports, and waterways, replacing lead pipes, providing affordable high-speed internet, and investing in American products to support American jobs."],
     'output_text': " In response to Russia's aggression in Ukraine, the United States and its allies have imposed economic sanctions and are taking other measures to hold Putin accountable. The US is also providing economic and military assistance to Ukraine, protecting NATO countries, and passing legislation to help struggling families and rebuild America's infrastructure. The world is uniting in support of Ukraine and democracy, and the US stands with its Ukrainian-American citizens."}

自定义提示

您还可以在该链上使用您自己的prompt。在此示例中,我们将用意大利语回复。

python 复制代码
# 该prompt说:要用意大利语做摘要
prompt_template = """Write a concise summary of the following:

{text}

CONCISE SUMMARY IN ITALIAN:"""
# 创建prompt的模板
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True, map_prompt=PROMPT, combine_prompt=PROMPT)
chain({"input_documents": docs}, return_only_outputs=True)

自定义MapReduceChain

多输入提示

您还可以使用多输入提示。在此示例中,我们将使用 MapReduce 链来回答有关我们代码的特定问题。

python 复制代码
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
# 第一个prompt
map_template_string = """Give the following python code information, generate a description that explains what the code does and also mention the time complexity.
Code:
{code}

Return the the description in the following format:
name of the function: description of the function
"""

# 第二个prompt
reduce_template_string = """Given the following python function names and descriptions, answer the following question
{code_description}
Question: {question}
Answer:
"""
# 第一个prompt模板
MAP_PROMPT = PromptTemplate(input_variables=["code"], template=map_template_string)
# 第二个prompt模板
REDUCE_PROMPT = PromptTemplate(input_variables=["code_description", "question"], template=reduce_template_string)
# 大模型
llm = OpenAI()
# map 链
map_llm_chain = LLMChain(llm=llm, prompt=MAP_PROMPT)
#reduce 链
reduce_llm_chain = LLMChain(llm=llm, prompt=REDUCE_PROMPT)

generative_result_reduce_chain = StuffDocumentsChain(
    llm_chain=reduce_llm_chain,
    document_variable_name="code_description",
)

combine_documents = MapReduceDocumentsChain(
    llm_chain=map_llm_chain,
    combine_document_chain=generative_result_reduce_chain,
    document_variable_name="code",
)

map_reduce = MapReduceChain(
    combine_documents_chain=combine_documents,
    text_splitter=CharacterTextSplitter(separator="\n##\n", chunk_size=100, chunk_overlap=0),
)

代码片段为:

python 复制代码
code = """
def bubblesort(list):
   for iter_num in range(len(list)-1,0,-1):
      for idx in range(iter_num):
         if list[idx]>list[idx+1]:
            temp = list[idx]
            list[idx] = list[idx+1]
            list[idx+1] = temp
    return list
##
def insertion_sort(InputList):
   for i in range(1, len(InputList)):
      j = i-1
      nxt_element = InputList[i]
   while (InputList[j] > nxt_element) and (j >= 0):
      InputList[j+1] = InputList[j]
      j=j-1
   InputList[j+1] = nxt_element
   return InputList
##
def shellSort(input_list):
   gap = len(input_list) // 2
   while gap > 0:
      for i in range(gap, len(input_list)):
         temp = input_list[i]
         j = i
   while j >= gap and input_list[j - gap] > temp:
      input_list[j] = input_list[j - gap]
      j = j-gap
      input_list[j] = temp
   gap = gap//2
   return input_list

"""
python 复制代码
# 哪个函数的时间复杂度更好
map_reduce.run(input_text=code, question="Which function has a better time complexity?")

结果:

python 复制代码
    Created a chunk of size 247, which is longer than the specified 100
    Created a chunk of size 267, which is longer than the specified 100

    'shellSort has a better time complexity than both bubblesort and insertion_sort, as it has a time complexity of O(n^2), while the other two have a time complexity of O(n^2).'

refine(提炼) Chain

本节显示使用refine链进行汇总的结果。

python 复制代码
chain = load_summarize_chain(llm, chain_type="refine")

chain.run(docs)

结果:

python 复制代码
问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。

中间步骤

如果我们想要检查它们,我们还可以返回refine链的中间步骤。这是通过 return_refine_steps 变量完成的。

python 复制代码
# 注意这里指定参数
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="refine", return_intermediate_steps=True)

chain({"input_documents": docs}, return_only_outputs=True)
# 结果
'问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。'

自定义prompt

您还可以在该链上使用您自己的提示。在此示例中,我们将用意大利语回复。

python 复制代码
prompt_template = """写出以下内容的简洁摘要:

{text}

意大利语简洁摘要:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
refine_template = (
    "你的工作是编写最终摘要\n"
    "我们已经提供了一定程度的现有摘要: {existing_answer}\n"
    "我们有机会完善现有的摘要"
    "(only if needed) 下面有更多背景信息.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "鉴于新的背景,完善意大利语的原始摘要"
    "如果上下文没有用,则返回原始摘要。"
)
refine_prompt = PromptTemplate(
    input_variables=["existing_answer", "text"],
    template=refine_template,
)
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="refine", return_intermediate_steps=True, question_prompt=PROMPT, refine_prompt=refine_prompt)
chain({"input_documents": docs}, return_only_outputs=True)

结果:

python 复制代码
    {'intermediate_steps': ["\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia e bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi.",
      "\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia, bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale e chiudendo lo spazio aereo americano a tutti i voli russi. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi. Stiamo fornendo più di un miliardo di dollari in assistenza diretta all'Ucraina e fornendo assistenza militare,",
      "\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia, bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale e chiudendo lo spazio aereo americano a tutti i voli russi. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi. Stiamo fornendo più di un miliardo di dollari in assistenza diretta all'Ucraina e fornendo assistenza militare."],
     'output_text': "\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia, bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale e chiudendo lo spazio aereo americano a tutti i voli russi. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi. Stiamo fornendo più di un miliardo di dollari in assistenza diretta all'Ucraina e fornendo assistenza militare."}

参考地址:

https://python.langchain.com/docs/modules/chains/popular/summarize

相关推荐
Miku162 天前
从0到1,构建你的专属AI知识库:My-Chat-LangChain项目深度解析
人工智能·langchain
玲小珑2 天前
LangChain.js 完全开发手册(十七)实战综合项目三:个性化学习助手平台
langchain·ai编程
bst@微胖子2 天前
Langchain之Agent代理的使用
langchain
猫头虎2 天前
openAI发布的AI浏览器:什么是Atlas?(含 ChatGPT 浏览功能)macOS 离线下载安装Atlas完整教程
人工智能·macos·chatgpt·langchain·prompt·aigc·agi
工藤学编程3 天前
零基础学AI大模型之LangChain PyPDFLoader实战与PDF图片提取全解析
人工智能·langchain·pdf
Qiuner3 天前
快速入门LangChain4j Ollama本地部署与阿里百炼请求大模型
语言模型·langchain·nlp·llama·ollama
大模型教程3 天前
一套完整的 RAG 脚手架,附完整代码,基于LangChain
程序员·langchain·llm
韩宁羽4 天前
从0到1,LangChain+RAG全链路实战AI知识库
langchain
大模型真好玩4 天前
LangGraph实战项目:从零手搓DeepResearch(三)——LangGraph多智能体搭建与部署
人工智能·langchain·mcp
小新学习屋5 天前
大模型-智能体-【篇四: Agent GPT 、AgentTuning、LangChain-Agent】
gpt·langchain·大模型·智能体