LangChain笔记 - 技术栈

很好的LLM知识博客：

https://lilianweng.github.io/posts/2023-06-23-agent/

LangChain的prompt hub:

https://smith.langchain.com/hub

一. Q&A

os.environ["OPENAI_API_KEY"] = "OpenAI的KEY" # 把openai-key放到环境变量里；

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4-0125-preview")

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200) # 相邻chunk之间是200字符。（可以指定按照\n还是句号来分割）

splits = text_splitter.split_documents(docs) # 文档切块

vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings()) # 使用openai的embedding来对文档做向量化；使用Chroma向量库；

retriever = vectorstore.as_retriever()

prompt = hub.pull("rlm/rag-prompt") # 从hub上拉取现成的prompt

rag_chain = (

{"context": retriever | format_docs, "question": RunnablePassthrough()}

| prompt

| llm

| StrOutputParser()

) # 用竖线"|"来串联起各个组件；

Chat history

用LangChain封装好的组件，把2个子chain连接在一起。

Query改写的目的：

Q1:"任务分解指的是什么？"

A: "..."

Q2: "它分为哪几步?"

有了Query改写，Q2可被改写为"任务分解分为哪几步?", 进而使用其作为query查询到有用的doc；

2.5 Serving

和FastAPI集成：

复制代码

from fastapi import FastAPI
from langserve import add_routes


# 4. Create chain
chain = prompt_template | model | parser


# 4. App definition
app = FastAPI(
  title="LangChain Server",
  version="1.0",
  description="A simple API server using LangChain's Runnable interfaces",
)

# 5. Adding chain route

add_routes(
    app,
    chain,
    path="/chain",
)

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="localhost", port=8000)

Streaming

LangChain支持流式输出。

4.1 Wiki

langchain提供现成的Wikipedia数据库；已经向量化完毕封装在类里了。

from langchain_community.retrievers import WikipediaRetriever

wiki = WikipediaRetriever(top_k_results=6, doc_content_chars_max=2000)

Citation

让模型给出answer来自于context中docs的哪些片段。

可以在prompt里让模型给出answer的同时，也给出引用doc的id和文字片段。

例如：（注意"VERBATIM", 一字不差的)

复制代码

You're a helpful AI assistant. Given a user question and some Wikipedia article snippets, \
answer the user question and provide citations. If none of the articles answer the question, just say you don't know.

Remember, you must return both an answer and citations. A citation consists of a VERBATIM quote that \
justifies the answer and the ID of the quote article. Return a citation for every quote across all articles \
that justify the answer. Use the following format for your final output:

<cited_answer>
    <answer></answer>
    <citations>
        <citation><source_id></source_id><quote></quote></citation>
        <citation><source_id></source_id><quote></quote></citation>
        ...
    </citations>
</cited_answer>

Here are the Wikipedia articles:{context}"""
prompt_3 = ChatPromptTemplate.from_messages(
    [("system", system), ("human", "{question}")]
)

所谓的tool，原理可能（我猜）也是tool封装了以上的改prompt方法。

也可以先调LLM给出answer，再调一次LLM给出citation。缺点是调用了2次LLM。

二. structured output

LangChain支持一种叫做Pydantic的语法，描述output:

from typing import Optional
from langchain_core.pydantic_v1 import BaseModel, Field

class Person(BaseModel):
"""Information about a person."""

复制代码

 # ^ Doc-string for the entity Person.
 # This doc-string is sent to the LLM as the description of the schema Person,
 # and it can help to improve extraction results.

 # Note that:
 # 1. Each field is an `optional` -- this allows the model to decline to extract it!
 # 2. Each field has a `description` -- this description is used by the LLM.
 # Having a good description can help improve extraction results.
 name: Optional[str] = Field(default=None, description="The name of the person")
 hair_color: Optional[str] = Field(
     default=None, description="The color of the peron's hair if known"
 )
 height_in_meters: Optional[str] = Field(
     default=None, description="Height measured in meters"
 )

其中，加了"Optional"的，有则输出，无则不输出。

runnable = prompt | llm.with_structured_output(schema=Person)

text = "Alan Smith is 6 feet tall and has blond hair."

runnable.invoke({"text": text})

注意：该llm必须是支持该Pydantic和with_structured_output的模型才行。

输出结果：

Person(name='Alan Smith', hair_color='blond', height_in_meters='1.8288')

提升效果的经验

Set the model temperature to 0. （只拿最优解）
Improve the prompt. The prompt should be precise and to the point.
Document the schema: Make sure the schema is documented to provide more information to the LLM.
Provide reference examples! Diverse examples can help, including examples where nothing should be extracted. （Few shot例子！）
If you have a lot of examples, use a retriever to retrieve the most relevant examples. (Few shot例子多些更好；正例和负例都要覆盖到，负例是指抽取不到所需字段的情况）
Benchmark with the best available LLM/Chat Model (e.g., gpt-4, claude-3, etc) -- check with the model provider which one is the latest and greatest! （有的模型连结构化格式都经常输出得不对。。。最好专门用结构化输出来做训练，再用）
If the schema is very large, try breaking it into multiple smaller schemas, run separate extractions and merge the results. （一次输出的格式，不能太复杂；否则要拆分）
Make sure that the schema allows the model to REJECT extracting information. If it doesn't, the model will be forced to make up information! （提示模型，可以拒绝输出，不懂不要乱说）
Add verification/correction steps (ask an LLM to correct or verify the results of the extraction). （用大模型或者代码或者人工来验证正确性，json load失败或不包含必要字段，则说明输出格式错误）

三. Chatbot

有base model和chat model，一定要选用专门为chat做过训练的chat model!
两处query改写：

带知识库检索的，要注意query改写，将类似"那是什么"中的"那"这种代词给替换掉，再去检索。

带memory的，也要query改写。

精简memory的目的：A. 大模型context长度有限；B.删去对话历史中的无关内容，让大模型更聚焦在有用信息上。

memory总结用的prompt:

Distillthe above chat messages into a single summary message. Include as many specific details as you can

知识检索，要记得加上"不懂别乱说"：

If the context doesn't contain any relevant information to the question, don't make something up and just say "I don't know"

四. Tools&Agent

Agent：大模型来决定，这一步用什么tool(or 直接输出最终结果）以及入参；每一步都把上一步的输出和tool的输出作为历史信息。

LangChain支持自定义tool:

from langchain_core.tools import tool

@tool
def multiply(first_int: int, second_int: int) -> int:
"""Multiply two integers together."""
return first_int * second_int

注释很有用，告诉大模型该tool是干什么用的。

tool调用失败后的策略：

A. 给一个兜底LLM（GPT4等能力更强的模型），第一个模型失败后，调兜底模型再试。

B. 把tool入参和报错信息，放到prompt里，再调用一次（大模型很听话）。

"The last tool call raised an exception. Try calling the tool again with corrected arguments. Do not repeat mistakes."

五. query & analysis

如果知识库是结构化/半结构化数据，可以把query输入大模型得到更合适的搜索query，例如{"query":"XXX", "publish_year":"2024"}
一个复杂query，输入LLM，得到若干简单query；使用简单query查询知识库得到结果，合在一起，回答复杂query

帮助LLM理解如何去分解得到简单query? 答：给几个few-shot-examples;

"需要搜索则输出query，不需要搜索则直接输出回答"
多个知识库：根据生成的query里的"库"字段，只查询相应的库；没必要所有库都查询；