langChainv0.3学习笔记（高级篇）

工具
- 创建工具
- - 从函数创建工具
  - - [@tool 装饰器](#@tool 装饰器)
    - 结构化工具
  - 从可运行对象创建工具
  - [子类化 BaseTool](#子类化 BaseTool)
  - 如何创建异步工具
  - 处理工具错误
  - 返回工具执行的artifact
- 使用内置工具和工具包
- - 自定义默认工具
  - 如何使用内置工具包
- 使用聊天模型调用工具
- - 定义工具模式
  - - [Python 函数](#Python 函数)
    - [LangChain 工具](#LangChain 工具)
    - [Pydantic 模型](#Pydantic 模型)
    - [TypedDict（需 `langchain-core >= 0.2.25`）](#TypedDict（需 langchain-core >= 0.2.25）)
  - 工具调用
  - 解析
- 将工具输出传递给聊天模型
- 将运行时值传递给工具
- - 隐藏模型参数
  - 在运行时注入参数
- 为工具添加人机协作
- - 添加人工审批
- 处理工具错误
- - 尝试/异常工具调用
  - 回退
  - 带异常重试
- 强制模型调用工具
- 禁用并行工具调用
- [从工具访问 RunnableConfig](#从工具访问 RunnableConfig)
- 从工具中流式传输事件
- 从工具返回artifact（工具产物（artifact）与模型输出分离机制）
- - 定义可区分输出的工具
  - [使用 ToolCall 调用工具](#使用 ToolCall 调用工具)
  - 与模型配合使用
  - [使用 `BaseTool` 自定义类工具](#使用 BaseTool 自定义类工具)
- 将可运行对象转换为工具
- - 基本用法
  - 将工具集成至代理（Agent）
- 为大型语言模型和聊天模型添加临时工具调用能力
- - 添加输出解析器
  - 调用工具
  - 返回工具输入
  - [✅ 添加工具 vs 添加临时工具：核心区别](#✅ 添加工具 vs 添加临时工具：核心区别)
- 将运行时机密传递给可运行对象
聊天模型&大型语言模型
- 使用聊天模型调用工具
- 从模型返回结构化数据
- - [.with_structured_output() 方法](#.with_structured_output() 方法)
  - - pydantic
    - [TypedDict 或 JSON Schema](#TypedDict 或 JSON Schema)
    - 在多个模式之间选择
    - 流式处理
  - 少量示例提示
  - 指定结构化输出的方法（高级）
  - 原始输出（高级）
  - 直接提示和解析模型输出
  - - [使用 PydanticOutputParser](#使用 PydanticOutputParser)
    - 自定义解析
- 缓存聊天模型的响应
- 获取日志概率
- 创建自定义聊天模型类
- 流式传输聊天模型响应
- - [🔁 同步流式输出示例](#🔁 同步流式输出示例)
  - [⚡ 异步流式输出示例](#⚡ 异步流式输出示例)
  - [📡 Astream 事件监听](#📡 Astream 事件监听)
- 跟踪聊天模型中的令牌使用情况
- - 流式处理
  - 使用回调
- 处理速率限制
- 使用少量示例提示与工具调用
- 绑定特定模型的工具
- 在本地运行模型
- 在一行中初始化任何模型
多模态
回调
- 在运行时传递回调
- 将回调附加到可运行对象
- 传播回调构造函数
- 创建自定义回调处理器
- 在异步环境中使用回调
- 调度自定义回调事件
- - [使用 `astream_events()` 观察事件](#使用 astream_events() 观察事件)
  - 异步回调处理器示例
  - 同步回调处理器示例
索引
- [使用 LangChain 索引 API](#使用 LangChain 索引 API)
- - 删除模式
  - 快速入门
  - - [None 删除模式](#None 删除模式)
    - ["incremental" 删除模式](#"incremental" 删除模式)
    - ["full" 删除模式](#"full" 删除模式)
  - 源
  - [与加载器一起使用索引 API](#与加载器一起使用索引 API)
[序列化------保存和加载 LangChain 对象](#序列化——保存和加载 LangChain 对象)
- 保存对象
- 加载对象

langChainv0.3学习笔记（中级篇）介绍了langChain的Runable链的概念，并且扩展了一些组件。高级篇中则会介绍高级组件，在这些组件上我们很容易就能看出langGraph的影子，这时候就会恍然大悟，┗|｀O′|┛ 嗷~~，原来LangGraph某某功能是基于此。

工具

工具是供语言模型调用的实用函数/程序 ，用于执行特定任务或访问外部 API。它们的输入通常由模型生成 ，输出将返回给模型继续推理。

使用工具的典型场景包括：

让模型访问数据库
让模型调用函数或服务（如天气 API、搜索引擎等）
控制本地代码执行

一个工具包括以下元素：

元素	说明
`name`	工具名称（模型将据此引用工具）
`description`	工具描述（用于指导模型使用场景）
`schema`	JSON Schema 格式的输入定义
`function`	实际执行逻辑的函数，可为同步或异步函数

绑定到模型后，name、description 和 schema 会作为上下文传递给模型，引导其调用工具。

python 复制代码

tools = [...]  # 定义一组工具
llm_with_tools = llm.bind_tools(tools)

ai_msg = llm_with_tools.invoke("do xyz...")

# AIMessage(tool_calls=[ToolCall(...)])

模型生成的 AIMessage 可能包含 tool_calls 字段，表示模型希望调用的工具及其参数。

一旦选择的工具被调用，结果可以传回模型，以便它可以完成正在执行的任务。它正在执行的任务。通常有两种不同的方式来调用工具并传回响应：

仅使用参数调用工具（返回原始输出）

当你仅使用参数调用工具时，你将获得原始工具输出（通常是一个字符串）。这通常看起来像：

python 复制代码

tool_call = ai_msg.tool_calls[0]
tool_output = tool.invoke(tool_call["args"])

tool_message = ToolMessage(
    content=tool_output,
    tool_call_id=tool_call["id"],
    name=tool_call["name"]
)

⚠️ 注意：

content 是返回给模型的内容。

如不想暴露原始输出，可在 ToolMessage.artifact 中保留原始输出，content 中放入转换后的信息：

python 复制代码

response_for_llm = transform(tool_output)

tool_message = ToolMessage(
    content=response_for_llm,
    tool_call_id=tool_call["id"],
    name=tool_call["name"],
    artifact=tool_output
)

使用 ToolCall 直接调用工具

python 复制代码

tool_call = ai_msg.tool_calls[0]
tool_message = tool.invoke(tool_call)

# tool_message -> ToolMessage(
#     content="tool result...",
#     tool_call_id=...,
#     name="tool_name"
# )

✅ 如果你希望返回的 ToolMessage 包含 artifact，工具函数需返回一个元组 (content, artifact)。

更多细节见：如何让工具返回 artifact。

💡 最佳实践

✅ 使用支持工具调用的模型（如 OpenAI 的 GPT-4 Turbo with tools），表现会更好。
✅ 设计清晰、简洁、语义明确的工具名称与描述。
✅ 定义准确的 JSON Schema 以提升模型理解与使用工具的能力。
✅ 倾向于功能专一、职责单一的工具，比复杂的多用途工具更易于模型使用。
✅ 结合提示工程，让模型理解工具的使用场景。

📌 示例：工具调用结构概览

json 复制代码

{
  "tool_calls": [
    {
      "name": "get_weather",
      "args": {
        "location": "San Francisco"
      },
      "id": "abc123"
    }
  ]
}

返回给模型的 ToolMessage：

json 复制代码

{
  "tool_call_id": "abc123",
  "name": "get_weather",
  "content": "当前温度为 22℃，晴"
}

创建工具

在使用 LangChain Agent（代理） 时，必须提供一组可供其调用的 Tool（工具）。工具不仅是简单的函数，还需要包含一些结构化的元信息，以便代理更好地使用它们。

🧩 Tool 的关键组成部分

属性名	类型	描述
`name`	`str`	工具名称，必须唯一，代理将使用该名称引用工具。
`description`	`str`	工具功能的简要说明，供模型/代理使用，用于提示或上下文参考。
`args_schema`	`pydantic.BaseModel`	可选但推荐。用于定义参数结构（含校验、示例），使用回调处理器时为必填。
`return_direct`	`bool`	仅适用于代理。若设为 `True`，代理在调用此工具后将立即返回结果给用户。

LangChain 支持通过以下方式构建工具：

✅ Python 函数

使用简单的 @tool 装饰器或 StructuredTool.from_function 类方法，适用于大多数场景。
⚙️ LangChain Runnable 接口

如果你已经使用 Runnable 构建了功能组件，也可将其转为 Tool。
🧱 继承 BaseTool 自定义工具类

这是最灵活的方式，提供最大控制能力（如异步逻辑、自定义验证、复杂输出），但需要更多代码和结构设计。

从函数创建工具可能足以满足大多数用例，可以通过简单的 @tool 装饰器来完成。如果需要更多配置，例如同时指定同步和异步实现，也可以使用 StructuredTool.from_function 类方法。

如果工具具有精心选择的名称、描述和 JSON 模式，模型的表现会更好。

从函数创建工具

@tool 装饰器

这个 @tool 装饰器是定义自定义工具的最简单方法。默认情况下，装饰器使用函数名称作为工具名称，但可以通过将字符串作为第一个参数传递来覆盖。此外，装饰器将使用函数的文档字符串作为工具的描述 - 因此必须提供文档字符串。

python 复制代码

<!--IMPORTS:[{"imported": "tool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html", "title": "How to create tools"}]-->
from langchain_core.tools import tool


@tool
def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b


# Let's inspect some of the attributes associated with the tool.
print(multiply.name)
print(multiply.description)
print(multiply.args)

python 复制代码

multiply
Multiply two numbers.
{'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}}

或者创建一个异步实现，像这样：

python 复制代码

<!--IMPORTS:[{"imported": "tool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html", "title": "How to create tools"}]-->
from langchain_core.tools import tool


@tool
async def amultiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b

请注意，@tool 支持解析注释、嵌套模式和其他特性：

python 复制代码

from typing import Annotated, List


@tool
def multiply_by_max(
    a: Annotated[str, "scale factor"],
    b: Annotated[List[int], "list of ints over which to take maximum"],
) -> int:
    """Multiply a by the maximum of b."""
    return a * max(b)


multiply_by_max.args_schema.schema()

python 复制代码

{'description': 'Multiply a by the maximum of b.',
 'properties': {'a': {'description': 'scale factor',
   'title': 'A',
   'type': 'string'},
  'b': {'description': 'list of ints over which to take maximum',
   'items': {'type': 'integer'},
   'title': 'B',
   'type': 'array'}},
 'required': ['a', 'b'],
 'title': 'multiply_by_maxSchema',
 'type': 'object'}

您还可以通过将工具名称和 JSON 参数传递给工具装饰器来自定义它们。

python 复制代码

from pydantic import BaseModel, Field


class CalculatorInput(BaseModel):
    a: int = Field(description="first number")
    b: int = Field(description="second number")


@tool("multiplication-tool", args_schema=CalculatorInput, return_direct=True)
def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b


# Let's inspect some of the attributes associated with the tool.
print(multiply.name)
print(multiply.description)
print(multiply.args)
print(multiply.return_direct)

python 复制代码

multiplication-tool
Multiply two numbers.
{'a': {'description': 'first number', 'title': 'A', 'type': 'integer'}, 'b': {'description': 'second number', 'title': 'B', 'type': 'integer'}}
True

@tool 可以选择性地解析 Google 风格文档字符串，并将文档字符串组件（如参数描述）与工具模式的相关部分关联起来。要切换此行为，请指定 parse_docstring：

python 复制代码

@tool(parse_docstring=True)
def foo(bar: str, baz: int) -> str:
    """The foo.

    Args:
        bar: The bar.
        baz: The baz.
    """
    return bar


foo.args_schema.schema()

python 复制代码

{'description': 'The foo.',
 'properties': {'bar': {'description': 'The bar.',
   'title': 'Bar',
   'type': 'string'},
  'baz': {'description': 'The baz.', 'title': 'Baz', 'type': 'integer'}},
 'required': ['bar', 'baz'],
 'title': 'fooSchema',
 'type': 'object'}

默认情况下，如果文档字符串无法正确解析，@tool(parse_docstring=True) 将引发 ValueError。

结构化工具

StructuredTool.from_function 类方法提供比 @tool 装饰器更多的可配置性，而无需太多额外代码。

python 复制代码

<!--IMPORTS:[{"imported": "StructuredTool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.structured.StructuredTool.html", "title": "How to create tools"}]-->
from langchain_core.tools import StructuredTool


def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b


async def amultiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b


calculator = StructuredTool.from_function(func=multiply, coroutine=amultiply)

print(calculator.invoke({"a": 2, "b": 3}))
print(await calculator.ainvoke({"a": 2, "b": 5}))

python 复制代码

6
10

要进行配置：

python 复制代码

class CalculatorInput(BaseModel):
    a: int = Field(description="first number")
    b: int = Field(description="second number")


def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b


calculator = StructuredTool.from_function(
    func=multiply,
    name="Calculator",
    description="multiply numbers",
    args_schema=CalculatorInput,
    return_direct=True,
    # coroutine= ... <- you can specify an async method if desired as well
)

print(calculator.invoke({"a": 2, "b": 3}))
print(calculator.name)
print(calculator.description)
print(calculator.args)

python 复制代码

6
Calculator
multiply numbers
{'a': {'description': 'first number', 'title': 'A', 'type': 'integer'}, 'b': {'description': 'second number', 'title': 'B', 'type': 'integer'}}

从可运行对象创建工具

接受字符串或 dict 输入的 LangChain Runnables 可以使用 as_tool 方法转换为工具，该方法允许为参数指定名称、描述和其他模式信息。

python 复制代码

<!--IMPORTS:[{"imported": "GenericFakeChatModel", "source": "langchain_core.language_models", "docs": "https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.fake_chat_models.GenericFakeChatModel.html", "title": "How to create tools"}, {"imported": "StrOutputParser", "source": "langchain_core.output_parsers", "docs": "https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.string.StrOutputParser.html", "title": "How to create tools"}, {"imported": "ChatPromptTemplate", "source": "langchain_core.prompts", "docs": "https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html", "title": "How to create tools"}]-->
from langchain_core.language_models import GenericFakeChatModel
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [("human", "Hello. Please respond in the style of {answer_style}.")]
)

# Placeholder LLM
llm = GenericFakeChatModel(messages=iter(["hello matey"]))

chain = prompt | llm | StrOutputParser()

as_tool = chain.as_tool(
    name="Style responder", description="Description of when to use tool."
)
as_tool.args

python 复制代码

/var/folders/4j/2rz3865x6qg07tx43146py8h0000gn/T/ipykernel_95770/2548361071.py:14: LangChainBetaWarning: This API is in beta and may change in the future.
  as_tool = chain.as_tool(

{'answer_style': {'title': 'Answer Style', 'type': 'string'}}

子类化 BaseTool

您可以通过从 BaseTool 子类化来定义自定义工具。这提供了对工具定义的最大控制，但需要编写更多代码。

python 复制代码

<!--IMPORTS:[{"imported": "AsyncCallbackManagerForToolRun", "source": "langchain_core.callbacks", "docs": "https://python.langchain.com/api_reference/core/callbacks/langchain_core.callbacks.manager.AsyncCallbackManagerForToolRun.html", "title": "How to create tools"}, {"imported": "CallbackManagerForToolRun", "source": "langchain_core.callbacks", "docs": "https://python.langchain.com/api_reference/core/callbacks/langchain_core.callbacks.manager.CallbackManagerForToolRun.html", "title": "How to create tools"}, {"imported": "BaseTool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.BaseTool.html", "title": "How to create tools"}]-->
from typing import Optional, Type

from langchain_core.callbacks import (
    AsyncCallbackManagerForToolRun,
    CallbackManagerForToolRun,
)
from langchain_core.tools import BaseTool
from pydantic import BaseModel


class CalculatorInput(BaseModel):
    a: int = Field(description="first number")
    b: int = Field(description="second number")


# Note: It's important that every field has type hints. BaseTool is a
# Pydantic class and not having type hints can lead to unexpected behavior.
class CustomCalculatorTool(BaseTool):
    name: str = "Calculator"
    description: str = "useful for when you need to answer questions about math"
    args_schema: Type[BaseModel] = CalculatorInput
    return_direct: bool = True

    def _run(
        self, a: int, b: int, run_manager: Optional[CallbackManagerForToolRun] = None
    ) -> str:
        """Use the tool."""
        return a * b

    async def _arun(
        self,
        a: int,
        b: int,
        run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
    ) -> str:
        """Use the tool asynchronously."""
        # If the calculation is cheap, you can just delegate to the sync implementation
        # as shown below.
        # If the sync calculation is expensive, you should delete the entire _arun method.
        # LangChain will automatically provide a better implementation that will
        # kick off the task in a thread to make sure it doesn't block other async code.
        return self._run(a, b, run_manager=run_manager.get_sync())

python 复制代码

multiply = CustomCalculatorTool()
print(multiply.name)
print(multiply.description)
print(multiply.args)
print(multiply.return_direct)

print(multiply.invoke({"a": 2, "b": 3}))
print(await multiply.ainvoke({"a": 2, "b": 3}))

python 复制代码

Calculator
useful for when you need to answer questions about math
{'a': {'description': 'first number', 'title': 'A', 'type': 'integer'}, 'b': {'description': 'second number', 'title': 'B', 'type': 'integer'}}
True
6
6

如何创建异步工具

所有 Runnables 都暴露 invoke 和 ainvoke 方法（以及其他方法，如 batch、abatch、astream 等）。

因此，即使您只提供工具的 sync 实现，您仍然可以使用 ainvoke 接口，但有一些重要事项需要了解：

LangChain 默认提供异步实现，假设该函数计算开销较大，因此会将执行委托给另一个线程。
如果您在异步代码库中工作，应该创建异步工具而不是同步工具，以避免因该线程而产生小的开销。
如果您需要同步和异步实现，请使用 StructuredTool.from_function 或从 BaseTool 子类化。
如果同时实现同步和异步，并且同步代码运行速度较快，请覆盖默认的 LangChain 异步实现并直接调用同步代码。
您不能也不应该将同步 invoke 与异步工具一起使用。

python 复制代码

<!--IMPORTS:[{"imported": "StructuredTool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.structured.StructuredTool.html", "title": "How to create tools"}]-->
from langchain_core.tools import StructuredTool


def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b


calculator = StructuredTool.from_function(func=multiply)

print(calculator.invoke({"a": 2, "b": 3}))
print(
    await calculator.ainvoke({"a": 2, "b": 5})
)  # Uses default LangChain async implementation incurs small overhead

python 复制代码

6
10

python 复制代码

<!--IMPORTS:[{"imported": "StructuredTool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.structured.StructuredTool.html", "title": "How to create tools"}]-->
from langchain_core.tools import StructuredTool


def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b


async def amultiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b


calculator = StructuredTool.from_function(func=multiply, coroutine=amultiply)

print(calculator.invoke({"a": 2, "b": 3}))
print(
    await calculator.ainvoke({"a": 2, "b": 5})
)  # Uses use provided amultiply without additional overhead

python 复制代码

6
10

当仅提供异步定义时，您不应该也不能使用 .invoke。

python 复制代码

@tool
async def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b


try:
    multiply.invoke({"a": 2, "b": 3})
except NotImplementedError:
    print("Raised not implemented error. You should not be doing this.")

Raised not implemented error. You should not be doing this.

处理工具错误

如果您在使用带有代理的工具，您可能需要一个错误处理策略，以便代理能够从错误中恢复并继续执行。

一个简单的策略是在工具内部抛出 ToolException，并使用 handle_tool_error 指定错误处理程序。

当指定错误处理程序时，异常将被捕获，错误处理程序将决定从工具返回哪个输出。

您可以将 handle_tool_error 设置为 True、字符串值或函数。如果是函数，该函数应接受一个 ToolException 作为参数并返回一个值。

请注意，仅仅抛出 ToolException 是无效的。您需要首先设置工具的 handle_tool_error，因为它的默认值是 False。

python 复制代码

from langchain_core.tools import ToolException


def get_weather(city: str) -> int:
    """Get weather for the given city."""
    raise ToolException(f"Error: There is no city by the name of {city}.")

这是一个默认 handle_tool_error=True 行为的示例。

python 复制代码

get_weather_tool = StructuredTool.from_function(
    func=get_weather,
    handle_tool_error=True,
)

get_weather_tool.invoke({"city": "foobar"})
# 'Error: There is no city by the name of foobar.'

我们可以将 handle_tool_error 设置为一个始终返回的字符串。

python 复制代码

get_weather_tool = StructuredTool.from_function(
    func=get_weather,
    handle_tool_error="There is no such city, but it's probably above 0K there!",
)

get_weather_tool.invoke({"city": "foobar"})
# "There is no such city, but it's probably above 0K there!"

使用函数处理错误：

python 复制代码

def _handle_error(error: ToolException) -> str:
    return f"The following errors occurred during tool execution: `{error.args[0]}`"


get_weather_tool = StructuredTool.from_function(
    func=get_weather,
    handle_tool_error=_handle_error,
)

get_weather_tool.invoke({"city": "foobar"})
# 'The following errors occurred during tool execution: `Error: There is no city by the name of foobar.`'

返回工具执行的artifact

有时工具执行的工件我们希望能够让下游组件在我们的链或代理中访问，但我们不想将其暴露给模型本身。例如，如果工具返回自定义对象如文档，我们可能希望将一些视图或元数据传递给模型，而不将原始输出传递给模型。同时，我们可能希望能够在其他地方访问这个完整的输出，例如在下游工具中。

工具和 ToolMessage 接口使得能够区分工具输出中用于模型的部分（这是 ToolMessage.content）和用于模型外部使用的部分（ToolMessage.artifact）。

如果我们希望我们的工具区分消息内容和其他工件，我们需要在定义工具时指定 response_format="content_and_artifact"，并确保返回一个元组 (content, artifact)：

python 复制代码

import random
from typing import List, Tuple
from langchain_core.tools import tool


@tool(response_format="content_and_artifact")
def generate_random_ints(min: int, max: int, size: int) -> Tuple[str, List[int]]:
    """Generate size random ints in the range [min, max]."""
    array = [random.randint(min, max) for _ in range(size)]
    content = f"Successfully generated array of {size} random ints in [{min}, {max}]."
    return content, array

如果我们直接使用工具参数调用我们的工具，我们将只得到输出的内容部分：

python 复制代码

generate_random_ints.invoke({"min": 0, "max": 9, "size": 10})
# 'Successfully generated array of 10 random ints in [0, 9].'

如果我们使用 ToolCall（如工具调用模型生成的那样）调用我们的工具，我们将得到一个 ToolMessage，其中包含内容和工具生成的工件：

python 复制代码

generate_random_ints.invoke(
    {
        "name": "generate_random_ints",
        "args": {"min": 0, "max": 9, "size": 10},
        "id": "123",  # required
        "type": "tool_call",  # required
    }
)
# ToolMessage(
#   content='Successfully generated array of 10 random ints in [0, 9].', 
#   name='generate_random_ints', 
#   tool_call_id='123', 
#   artifact=[4, 8, 2, 4, 1, 0, 9, 5, 8, 1]
# )

在子类化 BaseTool 时，我们也可以这样做：

python 复制代码

from typing import List, Tuple
import random
from langchain_core.tools import BaseTool


class GenerateRandomFloats(BaseTool):
    name: str = "generate_random_floats"
    description: str = "Generate size random floats in the range [min, max]."
    response_format: str = "content_and_artifact"

    ndigits: int = 2

    def _run(self, min: float, max: float, size: int) -> Tuple[str, List[float]]:
        range_ = max - min
        array = [
            round(min + (range_ * random.random()), ndigits=self.ndigits)
            for _ in range(size)
        ]
        content = f"Generated {size} floats in [{min}, {max}], rounded to {self.ndigits} decimals."
        return content, array

    # Optionally define an equivalent async method
    # async def _arun(self, min: float, max: float, size: int) -> Tuple[str, List[float]]:
    #     ...


rand_gen = GenerateRandomFloats(ndigits=4)

rand_gen.invoke(
    {
        "name": "generate_random_floats",
        "args": {"min": 0.1, "max": 3.3333, "size": 3},
        "id": "123",
        "type": "tool_call",
    }
)
# ToolMessage(
#   content='Generated 3 floats in [0.1, 3.3333], rounded to 4 decimals.', 
#   name='generate_random_floats', 
#   tool_call_id='123', 
#   artifact=[1.5566, 0.5134, 2.7914]
# )

这里直接artifact翻译成"工件"可能会让人困惑，特别是在技术语境里，"artifact"一般指产物、结果、输出物、附带数据等更通用的意思。

在这里，artifact 指的是：

工具执行产生的原始完整数据或对象（比如文档、数组、模型结果等），
这些数据通常不直接暴露给模型，但希望在系统的其他部分使用或传递，
而模型只看到"内容"部分（content），通常是供模型理解的文本摘要、提示或视图。

所以，翻译成 "附加输出"、"附属数据"、"附带信息" 或者直接保留英文"artifact"都更合适，也更符合上下文。

比如：

内容（content）：模型看到的文本消息；
附加输出（artifact）：供系统内部使用的完整数据结构。

可以把它看成：

工具运行的"隐藏"或"额外"输出

不是直接给模型看的"文本"，而是你程序里其他部分能用的"原始数据"或"丰富信息"。

为什么需要这个机制？

模型只需要看到简洁文本内容，比如"成功生成了10个随机数"，而不需要看到那个庞大的数组本身，保持提示简洁。
程序其他模块可能需要完整数据，比如：
- 把生成的随机数组传给后续处理器；
- 存到数据库或者日志；
- 供代理中其他工具继续使用。
避免把复杂对象直接当成文本传给模型，既节省令牌，也避免模型误解或信息过载。

举个简单场景，你写了个工具，给模型：

content: "生成了10个随机数"
artifact: [4, 8, 2, 4, 1, 0, 9, 5, 8, 1]

模型收到前者，只用来判断下一步该干嘛；程序里别的部分拿后者做数据分析、存储或继续计算。

比如上文的：

response_format="content_and_artifact"

这是告诉 LangChain 你的工具返回的是两个东西：一部分给模型看的文本（content），另一部分是额外数据（artifact）。
返回一个元组 (content, artifact)

你的工具函数必须返回一个元组，第一个元素是内容文本（字符串），第二个元素是任意你想作为"工件"传递的对象（可以是列表、字典、自定义对象等）。

python 复制代码

from langchain_core.tools import tool
from typing import List, Tuple
import random

@tool(response_format="content_and_artifact")
def generate_random_numbers(min: int, max: int, count: int) -> Tuple[str, List[int]]:
    # 生成随机数列表
    numbers = [random.randint(min, max) for _ in range(count)]
    # 给模型的文本内容，简短说明
    content = f"生成了 {count} 个随机整数，范围在 {min} 到 {max} 之间。"
    # 返回文本和"工件"列表
    return content, numbers


# 调用工具
result = generate_random_numbers.invoke({"min": 1, "max": 10, "count": 5})

print(result)
# 输出： "生成了 5 个随机整数，范围在 1 到 10 之间。"

# 但是如果用模型调用的完整 ToolCall，会返回 ToolMessage，包含 content 和 artifact
tool_call = {
    "name": "generate_random_numbers",
    "args": {"min": 1, "max": 10, "count": 5},
    "id": "call123",
    "type": "tool_call"
}

tool_message = generate_random_numbers.invoke(tool_call)

print(tool_message.content)
# "生成了 5 个随机整数，范围在 1 到 10 之间。"

print(tool_message.artifact)
# 例如：[3, 9, 1, 6, 2]  # 这里是真正的数字列表，程序可以用，但模型看不到

你用 .invoke({"min":1, "max":10, "count":5}) 这种简短调用，得到的就是 content，字符串形式，适合给模型。
你用带 "type": "tool_call" 的完整调用时，返回的是 ToolMessage 对象，里面有：
- content: 给模型看的文本描述
- artifact: 你的原始数据（比如刚生成的数字列表）

为什么设计成这样？

因为模型只关心content文本 ，但是程序其他环节可能需要用artifact做后续计算、存储、校验等。

使用内置工具和工具包

LangChain 拥有大量第三方工具。

使用第三方工具时，请确保您了解该工具的工作原理、权限以及它所拥有的权限。请阅读其文档，并检查是否需要您从安全角度提供任何信息。

python 复制代码

!pip install -qU langchain-community wikipedia

python 复制代码

<!--IMPORTS:[{"imported": "WikipediaQueryRun", "source": "langchain_community.tools", "docs": "https://python.langchain.com/api_reference/community/tools/langchain_community.tools.wikipedia.tool.WikipediaQueryRun.html", "title": "How to use built-in tools and toolkits"}, {"imported": "WikipediaAPIWrapper", "source": "langchain_community.utilities", "docs": "https://python.langchain.com/api_reference/community/utilities/langchain_community.utilities.wikipedia.WikipediaAPIWrapper.html", "title": "How to use built-in tools and toolkits"}]-->
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=100)
tool = WikipediaQueryRun(api_wrapper=api_wrapper)

print(tool.invoke({"query": "langchain"}))

python 复制代码

Page: LangChain
Summary: LangChain is a framework designed to simplify the creation of applications

该工具具有以下默认设置：

python 复制代码

print(f"Name: {tool.name}")
print(f"Description: {tool.description}")
print(f"args schema: {tool.args}")
print(f"returns directly?: {tool.return_direct}")

python 复制代码

Name: wikipedia
Description: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.
args schema: {'query': {'description': 'query to look up on wikipedia', 'title': 'Query', 'type': 'string'}}
returns directly?: False

自定义默认工具

我们还可以修改内置的名称、描述和参数的 JSON 模式。

在定义参数的 JSON 模式时，输入必须与函数保持一致，因此不应更改。但您可以轻松为每个输入定义自定义描述。

python 复制代码

<!--IMPORTS:[{"imported": "WikipediaQueryRun", "source": "langchain_community.tools", "docs": "https://python.langchain.com/api_reference/community/tools/langchain_community.tools.wikipedia.tool.WikipediaQueryRun.html", "title": "How to use built-in tools and toolkits"}, {"imported": "WikipediaAPIWrapper", "source": "langchain_community.utilities", "docs": "https://python.langchain.com/api_reference/community/utilities/langchain_community.utilities.wikipedia.WikipediaAPIWrapper.html", "title": "How to use built-in tools and toolkits"}]-->
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
from pydantic import BaseModel, Field


class WikiInputs(BaseModel):
    """Inputs to the wikipedia tool."""

    query: str = Field(
        description="query to look up in Wikipedia, should be 3 or less words"
    )


tool = WikipediaQueryRun(
    name="wiki-tool",
    description="look up things in wikipedia",
    args_schema=WikiInputs,
    api_wrapper=api_wrapper,
    return_direct=True,
)

print(tool.run("langchain"))

python 复制代码

Page: LangChain
Summary: LangChain is a framework designed to simplify the creation of applications

python 复制代码

print(f"Name: {tool.name}")
print(f"Description: {tool.description}")
print(f"args schema: {tool.args}")
print(f"returns directly?: {tool.return_direct}")

python 复制代码

Name: wiki-tool
Description: look up things in wikipedia
args schema: {'query': {'description': 'query to look up in Wikipedia, should be 3 or less words', 'title': 'Query', 'type': 'string'}}
returns directly?: True

如何使用内置工具包

工具包是为特定任务设计的工具集合。它们具有方便的加载方法。

所有工具包都暴露一个 get_tools 方法，该方法返回工具列表。

您通常应该这样使用它们：

python 复制代码

# Initialize a toolkit
toolkit = ExampleTookit(...)

# Get list of tools
tools = toolkit.get_tools()

使用聊天模型调用工具

工具调用允许聊天模型通过"调用工具"来响应给定的提示。

请记住，虽然"工具调用"这个名称暗示模型直接执行某些操作，但实际上并非如此！模型仅生成工具的参数，实际运行工具（或不运行）取决于用户。

工具调用是一种通用技术，可以从模型生成结构化输出，即使您不打算调用任何工具也可以使用它。一个示例用例是从非结构化文本中提取。

LangChain实现了定义工具、将其传递给LLM以及表示工具调用的标准接口。本指南将介绍如何将工具绑定到LLM，然后调用LLM生成这些参数。

定义工具模式

为了使模型能够调用工具，我们需要传入描述工具功能及其参数的工具模式 。支持工具调用功能的聊天模型实现了 .bind_tools() 方法，用于将工具模式传递给模型。

工具模式的形式可以包括：

带类型提示和 docstring 的 Python 函数
Pydantic 模型
TypedDict 类型
LangChain 工具对象

模型的后续调用将会把这些工具模式与提示一起传入。

Python 函数

Python 函数可以作为工具模式，函数名、类型注解和文档字符串都会被作为工具 schema 的一部分传入模型：

python 复制代码

def add(a: int, b: int) -> int:
    """Add two integers.

    Args:
        a: First integer
        b: Second integer
    """
    return a + b

def multiply(a: int, b: int) -> int:
    """Multiply two integers.

    Args:
        a: First integer
        b: Second integer
    """
    return a * b

提示：定义良好的函数签名和 docstring 是 prompt engineering 的一部分，对于提升模型效果非常关键。

LangChain 工具

LangChain 提供了 @tool 装饰器，可以在工具定义中提供更多控制（如名称、描述、参数文档等）。

Pydantic 模型

你可以用 Pydantic 定义没有绑定函数的工具 schema。

python 复制代码

from pydantic import BaseModel, Field

class add(BaseModel):
    """Add two integers."""
    a: int = Field(..., description="First integer")
    b: int = Field(..., description="Second integer")

class multiply(BaseModel):
    """Multiply two integers."""
    a: int = Field(..., description="First integer")
    b: int = Field(..., description="Second integer")

⚠️ 注意：除非提供默认值，否则所有字段都是必填项。

TypedDict（需 `langchain-core >= 0.2.25`）

也可以使用 TypedDict 定义工具参数类型，结合 Annotated 提供描述。

python 复制代码

from typing_extensions import Annotated, TypedDict

class add(TypedDict):
    """Add two integers."""
    a: Annotated[int, ..., "First integer"]
    b: Annotated[int, ..., "Second integer"]

class multiply(TypedDict):
    """Multiply two integers."""
    a: Annotated[int, ..., "First integer"]
    b: Annotated[int, ..., "Second integer"]

tools = [add, multiply]

要将这些模式实际绑定到聊天模型，我们将使用 .bind_tools() 方法。这将处理将 add 和 multiply 模式转换为模型所需的格式。工具模式将在每次调用模型时传递。

python 复制代码

pip install -qU langchain-openai

python 复制代码

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

llm_with_tools = llm.bind_tools(tools)

query = "What is 3 * 12?"

llm_with_tools.invoke(query)

python 复制代码

AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_iXj4DiW1p7WLjTAQMRO0jxMs', 'function': {'arguments': '{"a":3,"b":12}', 'name': 'multiply'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 80, 'total_tokens': 97}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_483d39d857', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-0b620986-3f62-4df7-9ba3-4595089f9ad4-0', tool_calls=[{'name': 'multiply', 'args': {'a': 3, 'b': 12}, 'id': 'call_iXj4DiW1p7WLjTAQMRO0jxMs', 'type': 'tool_call'}], usage_metadata={'input_tokens': 80, 'output_tokens': 17, 'total_tokens': 97})

正如我们所看到的，我们的LLM生成了工具的参数！您可以查看bind_tools()的文档，了解自定义LLM选择工具的所有方法，以及如何强制LLM调用工具的指南，而不是让它自行决定。

工具调用

当工具调用出现在 LLM 的响应中，它们将附加在相应的消息或消息块中，作为工具调用对象的列表 ，位于 .tool_calls 属性中。

✅ 一个聊天模型可以同时调用多个工具。

一个 ToolCall 是一个类型为字典的对象，包含以下字段：

name：工具名称
args：参数值（字典）
id：调用标识符（可选）
type：固定为 'tool_call'

如果没有工具调用，.tool_calls 属性默认为空列表。

python 复制代码

query = "What is 3 * 12? Also, what is 11 + 49?"

llm_with_tools.invoke(query).tool_calls

返回结果：

python 复制代码

[
  {
    'name': 'multiply',
    'args': {'a': 3, 'b': 12},
    'id': 'call_1fyhJAbJHuKQe6n0PacubGsL',
    'type': 'tool_call'
  },
  {
    'name': 'add',
    'args': {'a': 11, 'b': 49},
    'id': 'call_fc2jVkKzwuPWyU7kS9qn1hyG',
    'type': 'tool_call'
  }
]

有时，LLM 提供商可能会生成格式错误的工具调用，例如：

参数不是合法 JSON
缺少字段或拼写错误

此时，解析失败的调用将被捕获在 .invalid_tool_calls 属性中，作为 InvalidToolCall 实例：

字段包括：

name：工具名称
args：原始字符串参数（非结构化）
id：调用标识符（可选）
error：错误信息

解析

若希望将 .tool_calls 中的内容转换为结构化对象（如 Pydantic 模型），可以使用 PydanticToolsParser：

python 复制代码

from langchain_core.output_parsers import PydanticToolsParser
from pydantic import BaseModel, Field


class add(BaseModel):
    """Add two integers."""
    a: int = Field(..., description="First integer")
    b: int = Field(..., description="Second integer")


class multiply(BaseModel):
    """Multiply two integers."""
    a: int = Field(..., description="First integer")
    b: int = Field(..., description="Second integer")


# 构建解析链
chain = llm_with_tools | PydanticToolsParser(tools=[add, multiply])

chain.invoke(query)

输出结果将自动转换为结构化对象：

python 复制代码

[
  multiply(a=3, b=12),
  add(a=11, b=49)
]

这样你就可以将模型的输出作为结构化工具调用对象来消费，而不是手动解析 JSON 参数字符串或字典，提升健壮性与可维护性。

将工具输出传递给聊天模型

一些模型能够进行工具调用 - 生成符合特定用户提供的模式的参数。

首先，让我们定义我们的工具和模型：

python 复制代码

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

python 复制代码

<!--IMPORTS:[{"imported": "tool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html", "title": "How to pass tool outputs to chat models"}]-->
from langchain_core.tools import tool


@tool
def add(a: int, b: int) -> int:
    """Adds a and b."""
    return a + b


@tool
def multiply(a: int, b: int) -> int:
    """Multiplies a and b."""
    return a * b


tools = [add, multiply]

llm_with_tools = llm.bind_tools(tools)

现在，让我们让模型调用一个工具。我们将其添加到我们视为对话历史的消息列表中：

python 复制代码

<!--IMPORTS:[{"imported": "HumanMessage", "source": "langchain_core.messages", "docs": "https://python.langchain.com/api_reference/core/messages/langchain_core.messages.human.HumanMessage.html", "title": "How to pass tool outputs to chat models"}]-->
from langchain_core.messages import HumanMessage

query = "What is 3 * 12? Also, what is 11 + 49?"

messages = [HumanMessage(query)]

ai_msg = llm_with_tools.invoke(messages)

print(ai_msg.tool_calls)

messages.append(ai_msg)

[{'name': 'multiply', 'args': {'a': 3, 'b': 12}, 'id': 'call_GPGPE943GORirhIAYnWv00rK', 'type': 'tool_call'}, {'name': 'add', 'args': {'a': 11, 'b': 49}, 'id': 'call_dm8o64ZrY3WFZHAvCh1bEJ6i', 'type': 'tool_call'}]

接下来，让我们使用模型填充的参数调用工具函数！方便的是，如果我们使用 ToolCall 调用 LangChain Tool，我们将自动获得一个可以反馈给模型的 ToolMessage！

python 复制代码

for tool_call in ai_msg.tool_calls:
    selected_tool = {"add": add, "multiply": multiply}[tool_call["name"].lower()]
    tool_msg = selected_tool.invoke(tool_call)
    messages.append(tool_msg)

messages

python 复制代码

[HumanMessage(content='What is 3 * 12? Also, what is 11 + 49?'),
 AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_loT2pliJwJe3p7nkgXYF48A1', 'function': {'arguments': '{"a": 3, "b": 12}', 'name': 'multiply'}, 'type': 'function'}, {'id': 'call_bG9tYZCXOeYDZf3W46TceoV4', 'function': {'arguments': '{"a": 11, "b": 49}', 'name': 'add'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 50, 'prompt_tokens': 87, 'total_tokens': 137}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_661538dc1f', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-e3db3c46-bf9e-478e-abc1-dc9a264f4afe-0', tool_calls=[{'name': 'multiply', 'args': {'a': 3, 'b': 12}, 'id': 'call_loT2pliJwJe3p7nkgXYF48A1', 'type': 'tool_call'}, {'name': 'add', 'args': {'a': 11, 'b': 49}, 'id': 'call_bG9tYZCXOeYDZf3W46TceoV4', 'type': 'tool_call'}], usage_metadata={'input_tokens': 87, 'output_tokens': 50, 'total_tokens': 137}),
 ToolMessage(content='36', name='multiply', tool_call_id='call_loT2pliJwJe3p7nkgXYF48A1'),
 ToolMessage(content='60', name='add', tool_call_id='call_bG9tYZCXOeYDZf3W46TceoV4')]

最后，我们将使用工具结果调用模型。模型将使用此信息生成对我们原始查询的最终答案：

python 复制代码

llm_with_tools.invoke(messages)

AIMessage(content='The result of \\(3 \\times 12\\) is 36, and the result of \\(11 + 49\\) is 60.', response_metadata={'token_usage': {'completion_tokens': 31, 'prompt_tokens': 153, 'total_tokens': 184}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_661538dc1f', 'finish_reason': 'stop', 'logprobs': None}, id='run-87d1ef0a-1223-4bb3-9310-7b591789323d-0', usage_metadata={'input_tokens': 153, 'output_tokens': 31, 'total_tokens': 184})

请注意，每个 ToolMessage 必须包含一个与模型生成的原始工具调用中的 id 匹配的 tool_call_id。这有助于模型将工具响应与工具调用匹配。

工具调用代理，如 LangGraph 中的代理，使用此基本流程来回答查询和解决任务。

将运行时值传递给工具

您可能需要将仅在运行时已知的值绑定到工具。例如，工具逻辑可能需要使用发出请求的用户的ID。

大多数情况下，这些值不应由大型语言模型（LLM）控制。实际上，允许LLM控制用户ID可能会导致安全风险。

相反，LLM应仅控制工具中应由LLM控制的参数，而其他参数（如用户ID）应由应用程序逻辑固定。

我们可以将它们绑定到聊天模型，如下所示：

python 复制代码

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

隐藏模型参数

我们可以使用 InjectedToolArg 注解来标记我们工具的某些参数，例如 user_id，表示它们在运行时被注入，意味着它们不应该由模型生成：

python 复制代码

<!--IMPORTS:[{"imported": "InjectedToolArg", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.base.InjectedToolArg.html", "title": "How to pass run time values to tools"}, {"imported": "tool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html", "title": "How to pass run time values to tools"}]-->
from typing import List

from langchain_core.tools import InjectedToolArg, tool
from typing_extensions import Annotated

user_to_pets = {}


@tool(parse_docstring=True)
def update_favorite_pets(
    pets: List[str], user_id: Annotated[str, InjectedToolArg]
) -> None:
    """Add the list of favorite pets.

    Args:
        pets: List of favorite pets to set.
        user_id: User's ID.
    """
    user_to_pets[user_id] = pets


@tool(parse_docstring=True)
def delete_favorite_pets(user_id: Annotated[str, InjectedToolArg]) -> None:
    """Delete the list of favorite pets.

    Args:
        user_id: User's ID.
    """
    if user_id in user_to_pets:
        del user_to_pets[user_id]


@tool(parse_docstring=True)
def list_favorite_pets(user_id: Annotated[str, InjectedToolArg]) -> None:
    """List favorite pets if any.

    Args:
        user_id: User's ID.
    """
    return user_to_pets.get(user_id, [])

如果我们查看这些工具的输入模式，我们会看到 user_id 仍然被列出：

python 复制代码

update_favorite_pets.get_input_schema().schema()

{'description': 'Add the list of favorite pets.',
 'properties': {'pets': {'description': 'List of favorite pets to set.',
   'items': {'type': 'string'},
   'title': 'Pets',
   'type': 'array'},
  'user_id': {'description': "User's ID.",
   'title': 'User Id',
   'type': 'string'}},
 'required': ['pets', 'user_id'],
 'title': 'update_favorite_petsSchema',
 'type': 'object'}

但是如果我们查看工具调用模式，也就是传递给模型进行工具调用的内容，user_id 已被移除：

python 复制代码

update_favorite_pets.tool_call_schema.schema()

{'description': 'Add the list of favorite pets.',
 'properties': {'pets': {'description': 'List of favorite pets to set.',
   'items': {'type': 'string'},
   'title': 'Pets',
   'type': 'array'}},
 'required': ['pets'],
 'title': 'update_favorite_pets',
 'type': 'object'}

所以当我们调用我们的工具时，我们需要传入 user_id：

python 复制代码

user_id = "123"
update_favorite_pets.invoke({"pets": ["lizard", "dog"], "user_id": user_id})
print(user_to_pets)
print(list_favorite_pets.invoke({"user_id": user_id}))

{'123': ['lizard', 'dog']}
['lizard', 'dog']

但是当模型调用工具时，不会生成 user_id 参数：

python 复制代码

tools = [
    update_favorite_pets,
    delete_favorite_pets,
    list_favorite_pets,
]
llm_with_tools = llm.bind_tools(tools)
ai_msg = llm_with_tools.invoke("my favorite animals are cats and parrots")
ai_msg.tool_calls

[{'name': 'update_favorite_pets',
  'args': {'pets': ['cats', 'parrots']},
  'id': 'call_pZ6XVREGh1L0BBSsiGIf1xVm',
  'type': 'tool_call'}]

在运行时注入参数

如果我们想要实际使用模型生成的工具调用来执行我们的工具，我们需要自己注入 user_id：

python 复制代码

<!--IMPORTS:[{"imported": "chain", "source": "langchain_core.runnables", "docs": "https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.chain.html", "title": "How to pass run time values to tools"}]-->
from copy import deepcopy

from langchain_core.runnables import chain


@chain
def inject_user_id(ai_msg):
    tool_calls = []
    for tool_call in ai_msg.tool_calls:
        tool_call_copy = deepcopy(tool_call)
        tool_call_copy["args"]["user_id"] = user_id
        tool_calls.append(tool_call_copy)
    return tool_calls


inject_user_id.invoke(ai_msg)

[{'name': 'update_favorite_pets',
  'args': {'pets': ['cats', 'parrots'], 'user_id': '123'},
  'id': 'call_pZ6XVREGh1L0BBSsiGIf1xVm',
  'type': 'tool_call'}]

现在我们可以将我们的模型、注入代码和实际工具链在一起，创建一个工具执行链：

python 复制代码

tool_map = {tool.name: tool for tool in tools}


@chain
def tool_router(tool_call):
    return tool_map[tool_call["name"]]


chain = llm_with_tools | inject_user_id | tool_router.map()
chain.invoke("my favorite animals are cats and parrots")

python 复制代码

[ToolMessage(content='null', name='update_favorite_pets', tool_call_id='call_oYCD0THSedHTbwNAY3NW6uUj')]

查看 user_to_pets 字典，我们可以看到它已更新以包含猫和鹦鹉：

python 复制代码

user_to_pets

{'123': ['cats', 'parrots']}

为工具添加人机协作

建议使用 langgraph 来支持此功能。

让我们创建一些简单的（虚拟）工具和一个工具调用链：

python 复制代码

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

<!--IMPORTS:[{"imported": "AIMessage", "source": "langchain_core.messages", "docs": "https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.AIMessage.html", "title": "How to add a human-in-the-loop for tools"}, {"imported": "Runnable", "source": "langchain_core.runnables", "docs": "https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html", "title": "How to add a human-in-the-loop for tools"}, {"imported": "RunnablePassthrough", "source": "langchain_core.runnables", "docs": "https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.passthrough.RunnablePassthrough.html", "title": "How to add a human-in-the-loop for tools"}, {"imported": "tool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html", "title": "How to add a human-in-the-loop for tools"}]-->
from typing import Dict, List

from langchain_core.messages import AIMessage
from langchain_core.runnables import Runnable, RunnablePassthrough
from langchain_core.tools import tool


@tool
def count_emails(last_n_days: int) -> int:
    """Dummy function to count number of e-mails. Returns 2 * last_n_days."""
    return last_n_days * 2


@tool
def send_email(message: str, recipient: str) -> str:
    """Dummy function for sending an e-mail."""
    return f"Successfully sent email to {recipient}."


tools = [count_emails, send_email]
llm_with_tools = llm.bind_tools(tools)


def call_tools(msg: AIMessage) -> List[Dict]:
    """Simple sequential tool calling helper."""
    tool_map = {tool.name: tool for tool in tools}
    tool_calls = msg.tool_calls.copy()
    for tool_call in tool_calls:
        tool_call["output"] = tool_map[tool_call["name"]].invoke(tool_call["args"])
    return tool_calls


chain = llm_with_tools | call_tools
chain.invoke("how many emails did i get in the last 5 days?")

python 复制代码

[{'name': 'count_emails',
  'args': {'last_n_days': 5},
  'id': 'toolu_01QYZdJ4yPiqsdeENWHqioFW',
  'output': 10}]

添加人工审批

让我们在链中添加一个步骤，询问一个人是否批准或拒绝该调用请求。

在拒绝时，该步骤将引发异常，停止执行链的其余部分。

python 复制代码

import json


class NotApproved(Exception):
    """Custom exception."""


def human_approval(msg: AIMessage) -> AIMessage:
    """Responsible for passing through its input or raising an exception.

    Args:
        msg: output from the chat model

    Returns:
        msg: original output from the msg
    """
    tool_strs = "\n\n".join(
        json.dumps(tool_call, indent=2) for tool_call in msg.tool_calls
    )
    input_msg = (
        f"Do you approve of the following tool invocations\n\n{tool_strs}\n\n"
        "Anything except 'Y'/'Yes' (case-insensitive) will be treated as a no.\n >>>"
    )
    resp = input(input_msg)
    if resp.lower() not in ("yes", "y"):
        raise NotApproved(f"Tool invocations not approved:\n\n{tool_strs}")
    return msg

python 复制代码

chain = llm_with_tools | human_approval | call_tools
chain.invoke("how many emails did i get in the last 5 days?")

python 复制代码

Do you approve of the following tool invocations

{
  "name": "count_emails",
  "args": {
    "last_n_days": 5
  },
  "id": "toolu_01WbD8XeMoQaRFtsZezfsHor"
}

Anything except 'Y'/'Yes' (case-insensitive) will be treated as a no.
 >>> yes

[{'name': 'count_emails',
  'args': {'last_n_days': 5},
  'id': 'toolu_01WbD8XeMoQaRFtsZezfsHor',
  'output': 10}]

python 复制代码

try:
    chain.invoke("Send sally@gmail.com an email saying 'What's up homie'")
except NotApproved as e:
    print()
    print(e)

python 复制代码

Do you approve of the following tool invocations

{
  "name": "send_email",
  "args": {
    "recipient": "sally@gmail.com",
    "message": "What's up homie"
  },
  "id": "toolu_014XccHFzBiVcc9GV1harV9U"
}

Anything except 'Y'/'Yes' (case-insensitive) will be treated as a no.
 >>> no
``````output

Tool invocations not approved:

{
  "name": "send_email",
  "args": {
    "recipient": "sally@gmail.com",
    "message": "What's up homie"
  },
  "id": "toolu_014XccHFzBiVcc9GV1harV9U"
}

处理工具错误

使用大型语言模型调用工具通常比纯提示更可靠，但并不完美。模型可能会尝试调用不存在的工具，或者未能返回与请求的模式匹配的参数。保持模式简单、减少一次传递的工具数量，以及使用良好的名称和描述等策略可以帮助降低这种风险，但并非万无一失。

假设我们有以下（虚拟）工具和工具调用链。我们将故意使我们的工具复杂，以试图让模型出错。

python 复制代码

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

python 复制代码

# Define tool
from langchain_core.tools import tool


@tool
def complex_tool(int_arg: int, float_arg: float, dict_arg: dict) -> int:
    """Do something complex with a complex tool."""
    return int_arg * float_arg


llm_with_tools = llm.bind_tools(
    [complex_tool],
)

# Define chain
chain = llm_with_tools | (lambda msg: msg.tool_calls[0]["args"]) | complex_tool

当我们尝试用即使是相当明确的输入来调用这个链时，模型未能正确调用工具（它忘记了 dict_arg 参数）。

python 复制代码

chain.invoke(
    "use complex tool. the args are 5, 2.1, empty dictionary. don't forget dict_arg"
)

python 复制代码

---------------------------------------------------------------------------
``````output
ValidationError                           Traceback (most recent call last)
``````output
Cell In[5], line 1
----> 1 chain.invoke(
      2     "use complex tool. the args are 5, 2.1, empty dictionary. don't forget dict_arg"
      3 )
``````output
File ~/langchain/.venv/lib/python3.11/site-packages/langchain_core/runnables/base.py:2998, in RunnableSequence.invoke(self, input, config, **kwargs)
   2996             input = context.run(step.invoke, input, config, **kwargs)
   2997         else:
-> 2998             input = context.run(step.invoke, input, config)
   2999 # finish the root run
   3000 except BaseException as e:
``````output
File ~/langchain/.venv/lib/python3.11/site-packages/langchain_core/tools/base.py:456, in BaseTool.invoke(self, input, config, **kwargs)
    449 def invoke(
    450     self,
    451     input: Union[str, Dict, ToolCall],
    452     config: Optional[RunnableConfig] = None,
    453     **kwargs: Any,
    454 ) -> Any:
    455     tool_input, kwargs = _prep_run_args(input, config, **kwargs)
--> 456     return self.run(tool_input, **kwargs)
``````output
File ~/langchain/.venv/lib/python3.11/site-packages/langchain_core/tools/base.py:659, in BaseTool.run(self, tool_input, verbose, start_color, color, callbacks, tags, metadata, run_name, run_id, config, tool_call_id, **kwargs)
    657 if error_to_raise:
    658     run_manager.on_tool_error(error_to_raise)
--> 659     raise error_to_raise
    660 output = _format_output(content, artifact, tool_call_id, self.name, status)
    661 run_manager.on_tool_end(output, color=color, name=self.name, **kwargs)
``````output
File ~/langchain/.venv/lib/python3.11/site-packages/langchain_core/tools/base.py:622, in BaseTool.run(self, tool_input, verbose, start_color, color, callbacks, tags, metadata, run_name, run_id, config, tool_call_id, **kwargs)
    620 context = copy_context()
    621 context.run(_set_config_context, child_config)
--> 622 tool_args, tool_kwargs = self._to_args_and_kwargs(tool_input)
    623 if signature(self._run).parameters.get("run_manager"):
    624     tool_kwargs["run_manager"] = run_manager
``````output
File ~/langchain/.venv/lib/python3.11/site-packages/langchain_core/tools/base.py:545, in BaseTool._to_args_and_kwargs(self, tool_input)
    544 def _to_args_and_kwargs(self, tool_input: Union[str, Dict]) -> Tuple[Tuple, Dict]:
--> 545     tool_input = self._parse_input(tool_input)
    546     # For backwards compatibility, if run_input is a string,
    547     # pass as a positional argument.
    548     if isinstance(tool_input, str):
``````output
File ~/langchain/.venv/lib/python3.11/site-packages/langchain_core/tools/base.py:487, in BaseTool._parse_input(self, tool_input)
    485 if input_args is not None:
    486     if issubclass(input_args, BaseModel):
--> 487         result = input_args.model_validate(tool_input)
    488         result_dict = result.model_dump()
    489     elif issubclass(input_args, BaseModelV1):
``````output
File ~/langchain/.venv/lib/python3.11/site-packages/pydantic/main.py:568, in BaseModel.model_validate(cls, obj, strict, from_attributes, context)
    566 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    567 __tracebackhide__ = True
--> 568 return cls.__pydantic_validator__.validate_python(
    569     obj, strict=strict, from_attributes=from_attributes, context=context
    570 )
``````output
ValidationError: 1 validation error for complex_toolSchema
dict_arg
  Field required [type=missing, input_value={'int_arg': 5, 'float_arg': 2.1}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/missing

尝试/异常工具调用

处理错误的最简单方法是对工具调用步骤进行try/except，并在出现错误时返回有帮助的消息：

python 复制代码

from typing import Any

from langchain_core.runnables import Runnable, RunnableConfig


def try_except_tool(tool_args: dict, config: RunnableConfig) -> Runnable:
    try:
        complex_tool.invoke(tool_args, config=config)
    except Exception as e:
        return f"Calling tool with arguments:\n\n{tool_args}\n\nraised the following error:\n\n{type(e)}: {e}"


chain = llm_with_tools | (lambda msg: msg.tool_calls[0]["args"]) | try_except_tool

print(
    chain.invoke(
        "use complex tool. the args are 5, 2.1, empty dictionary. don't forget dict_arg"
    )
)

python 复制代码

Calling tool with arguments:

{'int_arg': 5, 'float_arg': 2.1}

raised the following error:

<class 'pydantic_core._pydantic_core.ValidationError'>: 1 validation error for complex_toolSchema
dict_arg
  Field required [type=missing, input_value={'int_arg': 5, 'float_arg': 2.1}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/missing

回退

在工具调用错误的情况下，我们也可以尝试回退到更好的模型。在这种情况下，我们将回退到一个使用 gpt-4-1106-preview 的相同链，而不是 gpt-3.5-turbo。

python 复制代码

chain = llm_with_tools | (lambda msg: msg.tool_calls[0]["args"]) | complex_tool

better_model = ChatOpenAI(model="gpt-4-1106-preview", temperature=0).bind_tools(
    [complex_tool], tool_choice="complex_tool"
)

better_chain = better_model | (lambda msg: msg.tool_calls[0]["args"]) | complex_tool

chain_with_fallback = chain.with_fallbacks([better_chain])

chain_with_fallback.invoke(
    "use complex tool. the args are 5, 2.1, empty dictionary. don't forget dict_arg"
)

python 复制代码

10.5

带异常重试

进一步说，我们可以尝试自动重新运行链，并传入异常，以便模型能够纠正其行为：

python 复制代码

<!--IMPORTS:[{"imported": "AIMessage", "source": "langchain_core.messages", "docs": "https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.AIMessage.html", "title": "How to handle tool errors"}, {"imported": "HumanMessage", "source": "langchain_core.messages", "docs": "https://python.langchain.com/api_reference/core/messages/langchain_core.messages.human.HumanMessage.html", "title": "How to handle tool errors"}, {"imported": "ToolCall", "source": "langchain_core.messages", "docs": "https://python.langchain.com/api_reference/core/messages/langchain_core.messages.tool.ToolCall.html", "title": "How to handle tool errors"}, {"imported": "ToolMessage", "source": "langchain_core.messages", "docs": "https://python.langchain.com/api_reference/core/messages/langchain_core.messages.tool.ToolMessage.html", "title": "How to handle tool errors"}, {"imported": "ChatPromptTemplate", "source": "langchain_core.prompts", "docs": "https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html", "title": "How to handle tool errors"}]-->
from langchain_core.messages import AIMessage, HumanMessage, ToolCall, ToolMessage
from langchain_core.prompts import ChatPromptTemplate


class CustomToolException(Exception):
    """Custom LangChain tool exception."""

    def __init__(self, tool_call: ToolCall, exception: Exception) -> None:
        super().__init__()
        self.tool_call = tool_call
        self.exception = exception


def tool_custom_exception(msg: AIMessage, config: RunnableConfig) -> Runnable:
    try:
        return complex_tool.invoke(msg.tool_calls[0]["args"], config=config)
    except Exception as e:
        raise CustomToolException(msg.tool_calls[0], e)


def exception_to_messages(inputs: dict) -> dict:
    exception = inputs.pop("exception")

    # Add historical messages to the original input, so the model knows that it made a mistake with the last tool call.
    messages = [
        AIMessage(content="", tool_calls=[exception.tool_call]),
        ToolMessage(
            tool_call_id=exception.tool_call["id"], content=str(exception.exception)
        ),
        HumanMessage(
            content="The last tool call raised an exception. Try calling the tool again with corrected arguments. Do not repeat mistakes."
        ),
    ]
    inputs["last_output"] = messages
    return inputs


# We add a last_output MessagesPlaceholder to our prompt which if not passed in doesn't
# affect the prompt at all, but gives us the option to insert an arbitrary list of Messages
# into the prompt if needed. We'll use this on retries to insert the error message.
prompt = ChatPromptTemplate.from_messages(
    [("human", "{input}"), ("placeholder", "{last_output}")]
)
chain = prompt | llm_with_tools | tool_custom_exception

# If the initial chain call fails, we rerun it withe the exception passed in as a message.
self_correcting_chain = chain.with_fallbacks(
    [exception_to_messages | chain], exception_key="exception"
)

python 复制代码

self_correcting_chain.invoke(
    {
        "input": "use complex tool. the args are 5, 2.1, empty dictionary. don't forget dict_arg"
    }
)

python 复制代码

10.5

强制模型调用工具

为了强制我们的 LLM 选择特定工具，我们可以使用 tool_choice 参数来确保某种行为。首先，让我们定义我们的模型和工具：

python 复制代码

<!--IMPORTS:[{"imported": "tool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html", "title": "How to force models to call a tool"}]-->
from langchain_core.tools import tool


@tool
def add(a: int, b: int) -> int:
    """Adds a and b."""
    return a + b


@tool
def multiply(a: int, b: int) -> int:
    """Multiplies a and b."""
    return a * b


tools = [add, multiply]

例如，我们可以通过使用以下代码强制我们的工具调用乘法工具：

python 复制代码

llm_forced_to_multiply = llm.bind_tools(tools, tool_choice="Multiply")
llm_forced_to_multiply.invoke("what is 2 + 4")

python 复制代码

AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_9cViskmLvPnHjXk9tbVla5HA', 'function': {'arguments': '{"a":2,"b":4}', 'name': 'Multiply'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 103, 'total_tokens': 112}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-095b827e-2bdd-43bb-8897-c843f4504883-0', tool_calls=[{'name': 'Multiply', 'args': {'a': 2, 'b': 4}, 'id': 'call_9cViskmLvPnHjXk9tbVla5HA'}], usage_metadata={'input_tokens': 103, 'output_tokens': 9, 'total_tokens': 112})

即使我们传递给它的内容不需要乘法 - 它仍然会调用该工具！

我们还可以通过将"any"（或"required"，这是OpenAI特定的）关键字传递给tool_choice参数，强制我们的工具选择至少一个工具。

python 复制代码

llm_forced_to_use_tool = llm.bind_tools(tools, tool_choice="any")
llm_forced_to_use_tool.invoke("What day is today?")

python 复制代码

AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_mCSiJntCwHJUBfaHZVUB2D8W', 'function': {'arguments': '{"a":1,"b":2}', 'name': 'Add'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 94, 'total_tokens': 109}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-28f75260-9900-4bed-8cd3-f1579abb65e5-0', tool_calls=[{'name': 'Add', 'args': {'a': 1, 'b': 2}, 'id': 'call_mCSiJntCwHJUBfaHZVUB2D8W'}], usage_metadata={'input_tokens': 94, 'output_tokens': 15, 'total_tokens': 109})

禁用并行工具调用

OpenAI工具调用默认以并行方式执行工具调用。这意味着如果我们问一个问题，比如"东京、纽约和芝加哥的天气如何？"，并且我们有一个获取天气的工具，它将并行调用该工具3次。我们可以通过使用parallel_tool_call参数强制它仅调用一个工具一次。

python 复制代码

首先，让我们设置我们的工具和模型：

<!--IMPORTS:[{"imported": "tool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html", "title": "How to disable parallel tool calling"}]-->
from langchain_core.tools import tool


@tool
def add(a: int, b: int) -> int:
    """Adds a and b."""
    return a + b


@tool
def multiply(a: int, b: int) -> int:
    """Multiplies a and b."""
    return a * b


tools = [add, multiply]

python 复制代码

<!--IMPORTS:[{"imported": "ChatOpenAI", "source": "langchain_openai", "docs": "https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html", "title": "How to disable parallel tool calling"}]-->
import os
from getpass import getpass

from langchain_openai import ChatOpenAI

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass()

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

现在让我们快速展示一下如何禁用并行工具调用的示例：

python 复制代码

llm_with_tools = llm.bind_tools(tools, parallel_tool_calls=False)
llm_with_tools.invoke("Please call the first tool two times").tool_calls

[{'name': 'add',
  'args': {'a': 2, 'b': 2},
  'id': 'call_Hh4JOTCDM85Sm9Pr84VKrWu5'}]

正如我们所看到的，尽管我们明确告诉模型调用工具两次，但通过禁用并行工具调用，模型被限制为只能调用一次。

所谓"并行调用工具"的本质

不是说模型自己开线程或协程去并行执行工具调用，模型本身没有执行代码的能力。

它的含义是：

模型在一次输出中，会返回多个 tool_call，这些调用可以由你的程序并行执行。

如果你调用模型，输入：

python 复制代码

"What's the sum of 2 and 3, and the product of 4 and 5?"

模型在默认设置（parallel_tool_calls=True）下可能返回：

json 复制代码

[
  {"name": "add", "args": {"a": 2, "b": 3}, "id": "..."},
  {"name": "multiply", "args": {"a": 4, "b": 5}, "id": "..."}
]

这意味着 模型一次性规划了两个工具调用 ，你作为开发者的程序可以选择并行地执行它们 （比如用 asyncio.gather）。

从工具访问 RunnableConfig

如果您有一个调用聊天模型、检索器或其他可运行的工具，您可能希望访问这些可运行的内部事件或使用附加属性进行配置。本指南将向您展示如何手动正确传递参数，以便您可以使用 astream_events() 方法做到这一点。

工具是可运行的，您可以在接口级别将它们视为任何其他可运行的工具 - 您可以像往常一样调用 invoke()、batch() 和 stream()。然而，在编写自定义工具时，您可能希望调用其他可运行的工具，如聊天模型或检索器。为了正确追踪和配置这些子调用，您需要手动访问并传递工具当前的 RunnableConfig 对象。本指南将向您展示一些如何做到这一点的示例。

要从您的自定义工具访问活动配置对象，您需要在工具的签名中添加一个参数，类型为 RunnableConfig。当您调用工具时，LangChain 将检查工具的签名，查找类型为 RunnableConfig 的参数，如果存在，将用正确的值填充该参数。

注意：参数的实际名称无关紧要，只有类型才重要。

为了说明这一点，定义一个自定义工具，该工具接受两个参数 - 一个类型为字符串，另一个类型为 RunnableConfig:

python 复制代码

<!--IMPORTS:[{"imported": "RunnableConfig", "source": "langchain_core.runnables", "docs": "https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.config.RunnableConfig.html", "title": "How to access the RunnableConfig from a tool"}, {"imported": "tool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html", "title": "How to access the RunnableConfig from a tool"}]-->
from langchain_core.runnables import RunnableConfig
from langchain_core.tools import tool


@tool
async def reverse_tool(text: str, special_config_param: RunnableConfig) -> str:
    """A test tool that combines input text with a configurable parameter."""
    return (text + special_config_param["configurable"]["additional_field"])[::-1]

然后，如果我们调用一个包含 configurable 字段的 config 的工具，我们可以看到 additional_field 被正确传递：

python 复制代码

await reverse_tool.ainvoke(
    {"text": "abc"}, config={"configurable": {"additional_field": "123"}}
)

python 复制代码

'321cba'

从工具中流式传输事件

如果您有调用聊天模型、检索器或其他可运行对象的工具，您可能希望访问这些可运行对象的内部事件或使用附加属性对其进行配置。本指南将向您展示如何正确手动传递参数，以便您可以使用 astream_events() 方法来实现这一点。

假设你有一个自定义工具，它调用一个链，通过提示聊天模型仅返回10个单词来压缩其输入，然后反转输出。首先，以一种简单的方式定义它：

python 复制代码

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o-mini")

python 复制代码

<!--IMPORTS:[{"imported": "StrOutputParser", "source": "langchain_core.output_parsers", "docs": "https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.string.StrOutputParser.html", "title": "How to stream events from a tool"}, {"imported": "ChatPromptTemplate", "source": "langchain_core.prompts", "docs": "https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html", "title": "How to stream events from a tool"}, {"imported": "tool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html", "title": "How to stream events from a tool"}]-->
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool


@tool
async def special_summarization_tool(long_text: str) -> str:
    """A tool that summarizes input text using advanced techniques."""
    prompt = ChatPromptTemplate.from_template(
        "You are an expert writer. Summarize the following text in 10 words or less:\n\n{long_text}"
    )

    def reverse(x: str):
        return x[::-1]

    chain = prompt | model | StrOutputParser() | reverse
    summary = await chain.ainvoke({"long_text": long_text})
    return summary

直接调用工具工作得很好：

python 复制代码

LONG_TEXT = """
NARRATOR:
(Black screen with text; The sound of buzzing bees can be heard)
According to all known laws of aviation, there is no way a bee should be able to fly. Its wings are too small to get its fat little body off the ground. The bee, of course, flies anyway because bees don't care what humans think is impossible.
BARRY BENSON:
(Barry is picking out a shirt)
Yellow, black. Yellow, black. Yellow, black. Yellow, black. Ooh, black and yellow! Let's shake it up a little.
JANET BENSON:
Barry! Breakfast is ready!
BARRY:
Coming! Hang on a second.
"""

await special_summarization_tool.ainvoke({"long_text": LONG_TEXT})

python 复制代码

'.yad noitaudarg rof tiftuo sesoohc yrraB ;scisyhp seifed eeB'

但是如果你想访问聊天模型的原始输出而不是完整的工具，你可以尝试使用 astream_events() 方法并寻找 on_chat_model_end 事件。发生了以下情况：

python 复制代码

stream = special_summarization_tool.astream_events(
    {"long_text": LONG_TEXT}, version="v2"
)

async for event in stream:
    if event["event"] == "on_chat_model_end":
        # Never triggers in python<=3.10!
        print(event)

这是因为上面的示例没有将工具的配置对象传递到内部链中。要解决此问题，请重新定义你的工具，使其接受一个特殊参数，类型为 RunnableConfig（有关更多详细信息，请参见此指南）。在执行内部链时，你还需要将该参数传递进去：

python 复制代码

<!--IMPORTS:[{"imported": "RunnableConfig", "source": "langchain_core.runnables", "docs": "https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.config.RunnableConfig.html", "title": "How to stream events from a tool"}]-->
from langchain_core.runnables import RunnableConfig


@tool
async def special_summarization_tool_with_config(
    long_text: str, config: RunnableConfig
) -> str:
    """A tool that summarizes input text using advanced techniques."""
    prompt = ChatPromptTemplate.from_template(
        "You are an expert writer. Summarize the following text in 10 words or less:\n\n{long_text}"
    )

    def reverse(x: str):
        return x[::-1]

    chain = prompt | model | StrOutputParser() | reverse
    # Pass the "config" object as an argument to any executed runnables
    summary = await chain.ainvoke({"long_text": long_text}, config=config)
    return summary

现在尝试使用你的新工具进行相同的 astream_events() 调用：

python 复制代码

stream = special_summarization_tool_with_config.astream_events(
    {"long_text": LONG_TEXT}, version="v2"
)

async for event in stream:
    if event["event"] == "on_chat_model_end":
        print(event)

python 复制代码

{'event': 'on_chat_model_end', 'data': {'output': AIMessage(content='Bee defies physics; Barry chooses outfit for graduation day.', response_metadata={'stop_reason': 'end_turn', 'stop_sequence': None}, id='run-d23abc80-0dce-4f74-9d7b-fb98ca4f2a9e', usage_metadata={'input_tokens': 182, 'output_tokens': 16, 'total_tokens': 198}), 'input': {'messages': [[HumanMessage(content="You are an expert writer. Summarize the following text in 10 words or less:\n\n\nNARRATOR:\n(Black screen with text; The sound of buzzing bees can be heard)\nAccording to all known laws of aviation, there is no way a bee should be able to fly. Its wings are too small to get its fat little body off the ground. The bee, of course, flies anyway because bees don't care what humans think is impossible.\nBARRY BENSON:\n(Barry is picking out a shirt)\nYellow, black. Yellow, black. Yellow, black. Yellow, black. Ooh, black and yellow! Let's shake it up a little.\nJANET BENSON:\nBarry! Breakfast is ready!\nBARRY:\nComing! Hang on a second.\n")]]}}, 'run_id': 'd23abc80-0dce-4f74-9d7b-fb98ca4f2a9e', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['f25c41fe-8972-4893-bc40-cecf3922c1fa']}

太棒了！这次发出了一个事件。

对于流式处理，astream_events() 会自动在启用流式处理的情况下调用链中的内部可运行对象，因此如果你想在聊天模型生成时流式传输令牌，你可以简单地过滤以查找 on_chat_model_stream 事件，而无需其他更改：

python 复制代码

stream = special_summarization_tool_with_config.astream_events(
    {"long_text": LONG_TEXT}, version="v2"
)

async for event in stream:
    if event["event"] == "on_chat_model_stream":
        print(event)

python 复制代码

{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='', id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42', usage_metadata={'input_tokens': 182, 'output_tokens': 0, 'total_tokens': 182})}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}
{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='Bee', id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42')}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}
{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=' def', id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42')}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}
{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='ies physics', id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42')}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}
{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=';', id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42')}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}
{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=' Barry', id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42')}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}
{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=' cho', id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42')}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}
{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='oses outfit', id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42')}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}
{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=' for', id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42')}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}
{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=' graduation', id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42')}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}
{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=' day', id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42')}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}
{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='.', id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42')}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}
{'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='', response_metadata={'stop_reason': 'end_turn', 'stop_sequence': None}, id='run-f24ab147-0b82-4e63-810a-b12bd8d1fb42', usage_metadata={'input_tokens': 0, 'output_tokens': 16, 'total_tokens': 16})}, 'run_id': 'f24ab147-0b82-4e63-810a-b12bd8d1fb42', 'name': 'ChatAnthropic', 'tags': ['seq:step:2'], 'metadata': {'ls_provider': 'anthropic', 'ls_model_name': 'claude-3-5-sonnet-20240620', 'ls_model_type': 'chat', 'ls_temperature': 0.0, 'ls_max_tokens': 1024}, 'parent_ids': ['385f3612-417c-4a70-aae0-cce3a5ba6fb6']}

为什么要这样写？

我不是要最终结果，我想知道**模型到底输出了什么？花了多久？用了多少 token？**有没有在流式生成 token？

你于是用：

python 复制代码

async for event in tool.astream_events(...):
    print(event)

但你发现一个事件都没有。

因为你没告诉 LangChain：

"我要监听里面这个模型的事件"，你也没把"监听器"传进去，所以 LangChain 根本不知道你想干啥。

✅ 怎么解决？

LangChain 设计中有个隐藏机制：

要想监听链的中间步骤（例如模型内部行为），你必须通过 config 把这些"监听设置"传递进去。

所以你必须改写你的工具，让它这样写：

python 复制代码

async def my_tool(input: str, config: RunnableConfig):
    return await chain.ainvoke({"input": input}, config=config)

所以整章的精简总结是：

LangChain 工具封装后默认是黑盒，想监听里面的模型、检索器等行为，必须传递 config，否则 astream_events() 不起作用。

你需不需要关心这章？

如果你只是调用模型拿结果，不用。
如果你做调试 / trace / token流监控 / 记录token消耗 等复杂功能，这章就是必须掌握的内容。

是不是只要定义 config: RunnableConfig 参数就可以了？

✅ 不止要"定义"，还要"传进去"。

你要做两件事：

python 复制代码

from langchain_core.runnables import RunnableConfig

@tool
async def my_tool(x: str, config: RunnableConfig):  # ✅ 1. 定义 config 参数
    chain = some_prompt | model | parser
    return await chain.ainvoke({"x": x}, config=config)  # ✅ 2. 调用链时把 config 传进去

如果你只定义参数但忘记在 ainvoke(..., config=config) 中传入，那还是一样获取不到事件！

我平时直接调用 astream_events()，是不是也没办法获取事件？

你平时这样写：

python 复制代码

stream = tool.astream_events(input_dict)

是可以获取事件的，但前提是内部链也支持事件监听（并正确传入 config）。

所以：

如果你监听的是一个"纯模型"或"标准链"，它本身就支持事件：✅ 正常工作。
如果你监听的是一个"工具包装过的链"，但工具内部没有传 config：❌ 无法获取事件。
如果你监听的是一个"工具包装过的链"，工具里写了 config: RunnableConfig 并传进去：✅ 可以获取事件。

🎯 快速判断你能否获取事件

调用方式	工具内是否传了 config	能否获取事件
直接调用模型 `.astream_events()`	-	✅ 是
用工具封装模型但没传 config	❌	❌ 否
用工具封装模型并传 config	✅	✅ 是

你如果想确保监听链内部发生了什么，可以这么做：

python 复制代码

from langchain_core.runnables import RunnableConfig

config = RunnableConfig(tags=["my_run"])

# 调用一个工具并监听事件（确保工具内部正确处理 config）
async for event in tool.astream_events({"x": "value"}, config=config, version="v2"):
    print(event)

传入一个空的 RunnableConfig() 也可以吗？

是的，完全可以。

python 复制代码

from langchain_core.runnables import RunnableConfig

config = RunnableConfig()
await chain.ainvoke({"input": "xxx"}, config=config)

即使你传的是空的 RunnableConfig()，LangChain 内部也会：

正确初始化 tracing 链条；
分发事件给监听器（如果有）；
确保链中的子组件（模型、retriever 等）可以被追踪。

📌 关键点不是 config 里面的内容，而是"你有没有传 config"这一行为本身。

不传 config = 没有上下文，LangChain 不会为你绑定 trace 或事件流
哪怕 config 是空的，也表示"我要你开启 trace 机制"

这三个区别是什么？（逐个讲）

1️⃣ "纯模型或标准链"可以直接监听事件

python 复制代码

chain = prompt | model | output_parser

直接调用：

python 复制代码

async for event in chain.astream_events({"input": "xxx"}):
    print(event)

✅ 可以监听到 on_chat_model_start / on_chat_model_stream / on_chat_model_end 等事件。

因为 chain 是由 LangChain 官方组件构建的，它们内部天然支持事件派发。

2️⃣ 工具包装过的链，但内部没有传 config

python 复制代码

@tool
async def tool_fn(input: str):  # ⚠️ 没有 config 参数
    return await chain.ainvoke({"input": input})  # ⚠️ 没有传 config

你尝试调用：

python 复制代码

async for event in tool_fn.astream_events({"input": "xxx"}):
    print(event)

❌ 不会打印任何事件！

因为链内部调用时没有传 config，LangChain 无法传播监听逻辑，你监听的只是"外壳工具"，不是模型本身。

3️⃣ 工具包装过的链，工具里写了 config 并传进去

python 复制代码

@tool
async def tool_fn(input: str, config: RunnableConfig):  # ✅ config 参数
    return await chain.ainvoke({"input": input}, config=config)  # ✅ config 传进去了

你再调用：

python 复制代码

async for event in tool_fn.astream_events({"input": "xxx"}):
    print(event)

✅ 可以监听模型的中间行为事件了，比如 token 流、返回内容、用时、trace ID 等等。

从工具返回artifact（工具产物（artifact）与模型输出分离机制）

"工件"（artifact）是一个上下文依赖性极强的术语。常见翻译如下：

场景	artifact 通常翻译为
编译/构建系统（如 Maven、Bazel）	工件（指编译产物）
数字取证 / AI 模型训练	伪影（指图像异常/副产物）
考古学	文物 / 人工制品
数据工程 / AI pipeline	数据产物 / 中间产物

但在LangChain 场景中，它指的是：工具执行产生的完整数据对象（不一定是文本），但我们只想把一部分文本传给模型，完整对象保留给系统用。

因此可以翻译为： 产物或完整输出或保留输出

工具是可以被模型调用的实用函数，其输出默认会反馈给模型 。但在某些场景中，我们希望工具执行的完整产物（如图像、数据框、自定义对象）可以被系统后续组件使用，而不暴露给语言模型本身。

为此，LangChain 提供了 Tool 和 ToolMessage 接口，允许我们区分：

模型可见的输出 → ToolMessage.content
仅供系统使用的附加产物 → ToolMessage.artifact

这种机制从 langchain-core >= 0.2.19 开始引入。

定义可区分输出的工具

要支持这一特性，需要在定义工具时设置 response_format="content_and_artifact"，并确保函数返回一个 (content, artifact) 元组。

bash 复制代码

pip install -qU "langchain-core>=0.2.19"

python 复制代码

import random
from typing import List, Tuple
from langchain_core.tools import tool

@tool(response_format="content_and_artifact")
def generate_random_ints(min: int, max: int, size: int) -> Tuple[str, List[int]]:
    """生成一定数量的随机整数。"""
    array = [random.randint(min, max) for _ in range(size)]
    content = f"成功生成了 {size} 个范围在 [{min}, {max}] 的随机整数。"
    return content, array

使用 ToolCall 调用工具

通常使用 .invoke() 调用工具时，只会返回内容部分：

python 复制代码

generate_random_ints.invoke({"min": 0, "max": 9, "size": 10})
# 输出：
'成功生成了 10 个范围在 [0, 9] 的随机整数。'

如果想同时获得附加产物（如生成的随机数数组），需要传入符合 ToolCall 协议的字典：

python 复制代码

generate_random_ints.invoke(
    {
        "name": "generate_random_ints",
        "args": {"min": 0, "max": 9, "size": 10},
        "id": "123",  # 必填
        "type": "tool_call",  # 必填
    }
)

# 输出：
ToolMessage(
    content='成功生成了 10 个范围在 [0, 9] 的随机整数。',
    name='generate_random_ints',
    tool_call_id='123',
    artifact=[2, 8, 0, 6, 0, 0, 1, 5, 0, 0]
)

与模型配合使用

我们可以将工具与语言模型绑定，模型可自动根据提示生成工具调用：

python 复制代码

from langchain_openai import ChatOpenAI
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass()
llm = ChatOpenAI(model="gpt-4o-mini")
llm_with_tools = llm.bind_tools([generate_random_ints])

调用示例：

python 复制代码

ai_msg = llm_with_tools.invoke("generate 6 positive ints less than 25")
ai_msg.tool_calls
# [{'name': 'generate_random_ints', 'args': {...}, 'id': ..., 'type': 'tool_call'}]

# 使用 tool_call 结构调用工具，获取 ToolMessage（包括 content + artifact）
generate_random_ints.invoke(ai_msg.tool_calls[0])

如果只传入 .invoke(args)，则只返回文本内容：

python 复制代码

generate_random_ints.invoke(ai_msg.tool_calls[0]["args"])
# '成功生成了 6 个范围在 [1, 24] 的随机整数。'

如果我们想要声明性地创建一个链，我们可以这样做：

python 复制代码

from operator import attrgetter

chain = llm_with_tools | attrgetter("tool_calls") | generate_random_ints.map()
chain.invoke("give me a random number between 1 and 5")

# 输出：
[ToolMessage(..., content='...', artifact=[5])]

使用 `BaseTool` 自定义类工具

你也可以不用 @tool 装饰器，而是继承 BaseTool 直接定义工具类：

python 复制代码

from langchain_core.tools import BaseTool

class GenerateRandomFloats(BaseTool):
    name = "generate_random_floats"
    description = "生成随机浮点数"
    response_format = "content_and_artifact"
    ndigits = 2

    def _run(self, min: float, max: float, size: int) -> Tuple[str, List[float]]:
        range_ = max - min
        array = [round(min + range_ * random.random(), self.ndigits) for _ in range(size)]
        content = f"生成了 {size} 个 [{min}, {max}] 范围内的小数，保留 {self.ndigits} 位小数。"
        return content, array

rand_gen = GenerateRandomFloats(ndigits=4)

# 获取内容字符串
rand_gen.invoke({"min": 0.1, "max": 3.3333, "size": 3})

# 获取 ToolMessage（包含 content 和 artifact）
rand_gen.invoke({
    "name": "generate_random_floats",
    "args": {"min": 0.1, "max": 3.3333, "size": 3},
    "id": "123",
    "type": "tool_call",
})

总结：何时使用 artifact 机制？

✅ 当工具输出的数据比较复杂（如：图像、结构化数据、DataFrame 等）
✅ 当你希望 只把摘要/说明文本 反馈给模型，而把完整数据留给其他系统使用
✅ 当你使用多工具链式组合、需要传递非结构化中间产物时

将可运行对象转换为工具

在这里，我们将演示如何将一个 LangChain 可运行对象转换为可以被代理、链或聊天模型使用的工具。

LangChain 工具是代理、链或聊天模型与外部世界交互的接口。有关工具调用、内置工具、自定义工具等的更多使用指南，请参见这里。

LangChain 工具（即 BaseTool 的实例）是带有附加约束的可运行对象（Runnable），使其能够被语言模型有效调用：

输入类型被限制为可序列化的对象，特别是字符串或 Python 字典；
必须包含名称（name）和描述（description），用于指示工具的用途和使用时机；
可选地包含详细的参数模式（args_schema），用于定义工具所需的输入结构和类型。也就是说，尽管工具的输入是一个 dict，但其中应通过 args_schema 明确哪些键是必需的，以及它们对应的类型。

凡是接受字符串或字典作为输入的可运行对象，都可以使用 .as_tool() 方法转换为工具。在转换过程中，可指定名称、描述以及参数模式等元信息。

基本用法

使用类型化的 dict 输入：

python 复制代码

<!--IMPORTS:[{"imported": "RunnableLambda", "source": "langchain_core.runnables", "docs": "https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.RunnableLambda.html", "title": "How to convert Runnables as Tools"}]-->
from typing import List

from langchain_core.runnables import RunnableLambda
from typing_extensions import TypedDict


class Args(TypedDict):
    a: int
    b: List[int]


def f(x: Args) -> str:
    return str(x["a"] * max(x["b"]))


runnable = RunnableLambda(f)
as_tool = runnable.as_tool(
    name="My tool",
    description="Explanation of when to use tool.",
)

python 复制代码

print(as_tool.description)

as_tool.args_schema.schema()

python 复制代码

Explanation of when to use tool.

python 复制代码

{'title': 'My tool',
 'type': 'object',
 'properties': {'a': {'title': 'A', 'type': 'integer'},
  'b': {'title': 'B', 'type': 'array', 'items': {'type': 'integer'}}},
 'required': ['a', 'b']}

python 复制代码

as_tool.invoke({"a": 3, "b": [1, 2]})

python 复制代码

'6'

在没有类型信息的情况下，可以通过 arg_types 指定参数类型：

python 复制代码

from typing import Any, Dict


def g(x: Dict[str, Any]) -> str:
    return str(x["a"] * max(x["b"]))


runnable = RunnableLambda(g)
as_tool = runnable.as_tool(
    name="My tool",
    description="Explanation of when to use tool.",
    arg_types={"a": int, "b": List[int]},
)

或者，可以通过直接传递所需的 args_schema 完全指定模式：

python 复制代码

from pydantic import BaseModel, Field


class GSchema(BaseModel):
    """Apply a function to an integer and list of integers."""

    a: int = Field(..., description="Integer")
    b: List[int] = Field(..., description="List of ints")


runnable = RunnableLambda(g)
as_tool = runnable.as_tool(GSchema)

字符串输入也被支持：

python 复制代码

def f(x: str) -> str:
    return x + "a"


def g(x: str) -> str:
    return x + "z"


runnable = RunnableLambda(f) | g
as_tool = runnable.as_tool()

python 复制代码

as_tool.invoke("b")

python 复制代码

'baz'

将工具集成至代理（Agent）

下面我们将把 LangChain 可运行组件作为工具整合到代理应用中。我们将通过以下内容进行演示：

一个文档检索器；
一个简单的 RAG 链，允许代理将相关查询委托给它。

我们首先实例化一个支持工具调用的聊天模型：

python 复制代码

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

python 复制代码

<!--IMPORTS:[{"imported": "Document", "source": "langchain_core.documents", "docs": "https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html", "title": "How to convert Runnables as Tools"}, {"imported": "InMemoryVectorStore", "source": "langchain_core.vectorstores", "docs": "https://python.langchain.com/api_reference/core/vectorstores/langchain_core.vectorstores.in_memory.InMemoryVectorStore.html", "title": "How to convert Runnables as Tools"}, {"imported": "OpenAIEmbeddings", "source": "langchain_openai", "docs": "https://python.langchain.com/api_reference/openai/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html", "title": "How to convert Runnables as Tools"}]-->
from langchain_core.documents import Document
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

documents = [
    Document(
        page_content="Dogs are great companions, known for their loyalty and friendliness.",
    ),
    Document(
        page_content="Cats are independent pets that often enjoy their own space.",
    ),
]

vectorstore = InMemoryVectorStore.from_documents(
    documents, embedding=OpenAIEmbeddings()
)

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 1},
)

将检索器转换为工具并添加到代理中：

python 复制代码

from langgraph.prebuilt import create_react_agent

tools = [
    retriever.as_tool(
        name="pet_info_retriever",
        description="Get information about pets.",
    )
]
agent = create_react_agent(llm, tools)

for chunk in agent.stream({"messages": [("human", "What are dogs known for?")]}):
    print(chunk)
    print("----")

python 复制代码

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_W8cnfOjwqEn4cFcg19LN9mYD', 'function': {'arguments': '{"__arg1":"dogs"}', 'name': 'pet_info_retriever'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d7f81de9-1fb7-4caf-81ed-16dcdb0b2ab4-0', tool_calls=[{'name': 'pet_info_retriever', 'args': {'__arg1': 'dogs'}, 'id': 'call_W8cnfOjwqEn4cFcg19LN9mYD'}], usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79})]}}
----
{'tools': {'messages': [ToolMessage(content="[Document(id='86f835fe-4bbe-4ec6-aeb4-489a8b541707', page_content='Dogs are great companions, known for their loyalty and friendliness.')]", name='pet_info_retriever', tool_call_id='call_W8cnfOjwqEn4cFcg19LN9mYD')]}}
----
{'agent': {'messages': [AIMessage(content='Dogs are known for being great companions, known for their loyalty and friendliness.', response_metadata={'token_usage': {'completion_tokens': 18, 'prompt_tokens': 134, 'total_tokens': 152}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-9ca5847a-a5eb-44c0-a774-84cc2c5bbc5b-0', usage_metadata={'input_tokens': 134, 'output_tokens': 18, 'total_tokens': 152})]}}
----

我们构建一个带风格参数的简易 RAG 链，并将其转换为 Tool：：

python 复制代码

from operator import itemgetter

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

system_prompt = """
You are an assistant for question-answering tasks.
Use the below context to answer the question. If
you don't know the answer, say you don't know.
Use three sentences maximum and keep the answer
concise.

Answer in the style of {answer_style}.

Question: {question}

Context: {context}
"""

prompt = ChatPromptTemplate.from_messages([("system", system_prompt)])

rag_chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question"),
        "answer_style": itemgetter("answer_style"),
    }
    | prompt
    | llm
    | StrOutputParser()
)

请注意，我们的链的输入模式包含所需的参数，因此它转换为一个工具而无需进一步说明：

python 复制代码

rag_chain.input_schema.schema()

python 复制代码

{'title': 'RunnableParallel<context,question,answer_style>Input',
 'type': 'object',
 'properties': {'question': {'title': 'Question'},
  'answer_style': {'title': 'Answer Style'}}}

rag_tool = rag_chain.as_tool(
    name="pet_expert",
    description="Get information about pets.",
)

下面我们再次调用代理。请注意，代理在其 tool_calls 中填充所需的参数：

python 复制代码

agent = create_react_agent(llm, [rag_tool])

for chunk in agent.stream(
    {"messages": [("human", "What would a pirate say dogs are known for?")]}
):
    print(chunk)
    print("----")

python 复制代码

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_17iLPWvOD23zqwd1QVQ00Y63', 'function': {'arguments': '{"question":"What are dogs known for according to pirates?","answer_style":"quote"}', 'name': 'pet_expert'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 28, 'prompt_tokens': 59, 'total_tokens': 87}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-7fef44f3-7bba-4e63-8c51-2ad9c5e65e2e-0', tool_calls=[{'name': 'pet_expert', 'args': {'question': 'What are dogs known for according to pirates?', 'answer_style': 'quote'}, 'id': 'call_17iLPWvOD23zqwd1QVQ00Y63'}], usage_metadata={'input_tokens': 59, 'output_tokens': 28, 'total_tokens': 87})]}}
----
{'tools': {'messages': [ToolMessage(content='"Dogs are known for their loyalty and friendliness, making them great companions for pirates on long sea voyages."', name='pet_expert', tool_call_id='call_17iLPWvOD23zqwd1QVQ00Y63')]}}
----
{'agent': {'messages': [AIMessage(content='According to pirates, dogs are known for their loyalty and friendliness, making them great companions for pirates on long sea voyages.', response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 119, 'total_tokens': 146}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-5a30edc3-7be0-4743-b980-ca2f8cad9b8d-0', usage_metadata={'input_tokens': 119, 'output_tokens': 27, 'total_tokens': 146})]}}
----

为大型语言模型和聊天模型添加临时工具调用能力

首先，让我们创建一个add和multiply工具。

python 复制代码

from langchain_core.tools import tool


@tool
def multiply(x: float, y: float) -> float:
    """Multiply two numbers together."""
    return x * y


@tool
def add(x: int, y: int) -> int:
    "Add two numbers."
    return x + y


tools = [multiply, add]

# Let's inspect the tools
for t in tools:
    print("--")
    print(t.name)
    print(t.description)
    print(t.args)

我们需要编写一个提示，指定模型可以访问的工具、这些工具的参数以及模型的期望输出格式。在这种情况下，我们将指示它输出一个形式为{"name": "...", "arguments": {...}}的JSON对象。

python 复制代码

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import render_text_description

rendered_tools = render_text_description(tools)
print(rendered_tools)

python 复制代码

multiply(x: float, y: float) -> float - Multiply two numbers together.
add(x: int, y: int) -> int - Add two numbers.

python 复制代码

system_prompt = f"""\
You are an assistant that has access to the following set of tools. 
Here are the names and descriptions for each tool:

{rendered_tools}

Given the user input, return the name and input of the tool to use. 
Return your response as a JSON blob with 'name' and 'arguments' keys.

The `arguments` should be a dictionary, with keys corresponding 
to the argument names and the values corresponding to the requested values.
"""

prompt = ChatPromptTemplate.from_messages(
    [("system", system_prompt), ("user", "{input}")]
)

python 复制代码

chain = prompt | model
message = chain.invoke({"input": "what's 3 plus 1132"})

# Let's take a look at the output from the model
# if the model is an LLM (not a chat model), the output will be a string.
if isinstance(message, str):
    print(message)
else:  # Otherwise it's a chat model
    print(message.content)

python 复制代码

{
    "name": "add",
    "arguments": {
        "x": 3,
        "y": 1132
    }
}

添加输出解析器

我们将使用JsonOutputParser将模型的输出解析为JSON。

python 复制代码

<!--IMPORTS:[{"imported": "JsonOutputParser", "source": "langchain_core.output_parsers", "docs": "https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.json.JsonOutputParser.html", "title": "How to add ad-hoc tool calling capability to LLMs and Chat Models"}]-->
from langchain_core.output_parsers import JsonOutputParser

chain = prompt | model | JsonOutputParser()
chain.invoke({"input": "what's thirteen times 4"})

{'name': 'multiply', 'arguments': {'x': 13.0, 'y': 4.0}}

调用工具

现在模型可以请求调用工具，我们需要编写一个可以实际调用工具的函数，该函数将通过名称选择适当的工具，并将模型选择的参数传递给它。

python 复制代码

from typing import Any, Dict, Optional, TypedDict

from langchain_core.runnables import RunnableConfig


class ToolCallRequest(TypedDict):
    """A typed dict that shows the inputs into the invoke_tool function."""

    name: str
    arguments: Dict[str, Any]


def invoke_tool(
    tool_call_request: ToolCallRequest, config: Optional[RunnableConfig] = None
):
    """A function that we can use the perform a tool invocation.

    Args:
        tool_call_request: a dict that contains the keys name and arguments.
            The name must match the name of a tool that exists.
            The arguments are the arguments to that tool.
        config: This is configuration information that LangChain uses that contains
            things like callbacks, metadata, etc.See LCEL documentation about RunnableConfig.

    Returns:
        output from the requested tool
    """
    tool_name_to_tool = {tool.name: tool for tool in tools}
    name = tool_call_request["name"]
    requested_tool = tool_name_to_tool[name]
    return requested_tool.invoke(tool_call_request["arguments"], config=config)

让我们测试一下 🧪！

python 复制代码

invoke_tool({"name": "multiply", "arguments": {"x": 3, "y": 5}})

15.0

让我们把它组合成一个可以进行加法和乘法运算的计算器链。

python 复制代码

chain = prompt | model | JsonOutputParser() | invoke_tool
chain.invoke({"input": "what's thirteen times 4.14137281"})

53.83784653

返回工具输入

返回不仅是工具输出而且是工具输入是很有帮助的。我们可以通过 RunnablePassthrough.assign 轻松做到这一点，将工具输出传递。这样会将输入传递给 RunnablePassthrough 组件（假设是一个字典），并在其上添加一个键，同时仍然传递当前输入中的所有内容：

python 复制代码

<!--IMPORTS:[{"imported": "RunnablePassthrough", "source": "langchain_core.runnables", "docs": "https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.passthrough.RunnablePassthrough.html", "title": "How to add ad-hoc tool calling capability to LLMs and Chat Models"}]-->
from langchain_core.runnables import RunnablePassthrough

chain = (
    prompt | model | JsonOutputParser() | RunnablePassthrough.assign(output=invoke_tool)
)
chain.invoke({"input": "what's thirteen times 4.14137281"})

python 复制代码

{'name': 'multiply',
 'arguments': {'x': 13, 'y': 4.14137281},
 'output': 53.83784653}

你问的是 "添加临时工具（ad-hoc tool）" 和"添加工具（如 .bind_tools()）" 有什么区别？

结合你给出的完整示例，这里是明确整理和解释：

✅ 添加工具 vs 添加临时工具：核心区别

对比项	`.bind_tools()` / `create_react_agent()` 等添加方式	临时工具（ad-hoc tool calling）
添加方式	使用 LangChain 提供的接口绑定工具，如 `llm.bind_tools()` 或 `agent = create_react_agent(llm, tools)`	自定义提示 + 解析输出 + 手动调用工具（即你代码中构造的方式）
是否结构化集成	✅ 工具注册到模型上下文，LangChain 自动处理调用	❌ 工具未真正注册，模型仅被"提示"可以使用它们
调用流程	LangChain 识别 tool_call，自动路由工具并返回结果	你需要手动： 1. 构造提示 → 2. 解析输出 → 3. 匹配工具名 → 4. 传参调用
模型支持要求	要求模型具备 tool calling 支持能力（如 OpenAI function calling 模式）	只要求模型能按 JSON 输出 `<name + args>`
适用场景	建议用于生产、长期集成的代理或链式系统	适用于临时试验、小任务调度、自定义链路控制
动态性	工具通常为静态注册，不能临时变化	工具可以按需渲染和指定，支持临时注入

整个示例是 手动实现一个 ad-hoc tool calling 流程，其本质过程如下：

定义工具函数 add 和 multiply（@tool）
构建提示（含工具描述），告诉模型应该返回什么格式的 JSON（name + arguments）
使用 LLM 输出 raw JSON 字符串
用 JsonOutputParser 转为 dict
手动调度工具：通过 invoke_tool() 根据 name 映射并调用

这属于 "临时工具"模式，不依赖模型的 tool_call 功能，而是靠你提供的提示和工具名称匹配去完成。

✅ 什么时候用临时工具？

模型不支持 tool calling 功能（如部分本地模型）
希望完全掌控工具描述、解析、调用逻辑
工具集合经常变化（临时构建工具）
工具链不依赖 LangChain 的代理体系

✅ 什么时候用绑定工具？

使用 .bind_tools()、create_react_agent()、LangGraph 的 tools= 参数时：

模型原生支持工具调用（OpenAI, Anthropic 等）
希望由 LangChain 自动处理 ToolMessage / 调用
工具集合稳定，链路结构清晰
更适用于复杂代理/工作流场景

🧠 总结一句话：

绑定工具是"框架式自动工具调用"，临时工具是"手动提示 + 工具调度控制"。

前者更标准、自动，后者更灵活、可控。你可以根据具体项目需求选择其中一种，甚至结合使用。

将运行时机密传递给可运行对象

我们可以使用 RunnableConfig 在运行时将机密传递给我们的可运行对象。具体来说，我们可以将带有 __ 前缀的机密传递给 configurable 字段。这将确保这些机密不会作为调用的一部分被追踪：

python 复制代码

<!--IMPORTS:[{"imported": "RunnableConfig", "source": "langchain_core.runnables", "docs": "https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.config.RunnableConfig.html", "title": "How to pass runtime secrets to runnables"}, {"imported": "tool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html", "title": "How to pass runtime secrets to runnables"}]-->
from langchain_core.runnables import RunnableConfig
from langchain_core.tools import tool


@tool
def foo(x: int, config: RunnableConfig) -> int:
    """Sum x and a secret int"""
    return x + config["configurable"]["__top_secret_int"]


foo.invoke({"x": 5}, {"configurable": {"__top_secret_int": 2, "traced_key": "bar"}})

python 复制代码

聊天模型&大型语言模型

纯文本输入/输出的大型语言模型往往较旧或较低级。许多新的流行模型最好用作聊天补全模型。即使对于非聊天用例。

聊天模型是一类以一系列消息作为输入 ，并返回聊天消息作为输出 的语言模型。它们通常是较新的模型，支持区分不同角色的消息（如 system、user、assistant），比传统的文本输入模型（LLMs）更适用于多轮对话。

虽然底层模型接收的是消息（message）结构，LangChain 的封装器也支持直接传入字符串 作为输入。在这种情况下，字符串会被自动包装为 HumanMessage 再传递给模型。

💡 这意味着：你可以用 ChatModel 替代 LLM 来处理纯文本输入。

✅ 构建 ChatModel 的标准参数

参数名	说明
`model`	模型名称
`temperature`	控制生成文本的随机性
`timeout`	请求超时时间（秒）
`max_tokens`	限制输出的最大 token 数
`stop`	停止生成的 token 序列
`max_retries`	请求失败时的最大重试次数
`api_key`	模型提供商的 API 密钥
`base_url`	发送请求的自定义地址（如代理、私有部署模型）

⚠️ 注意：

并非所有大模型供应商都支持全部参数（如部分模型不支持 max_tokens）。

这些标准参数仅对 langchain-openai、langchain-anthropic 等官方集成包 生效；langchain-community 集成中的模型可能不支持这些参数。

每个模型还可能接受特定的额外参数，请参考其对应的 API 文档。
一些聊天模型已针对工具调用进行了微调，并提供了专用的API。通常，这类模型在工具调用方面优于未微调的模型，推荐用于需要工具调用的用例。

一些聊天模型是多模态的，接受图像、音频甚至视频作为输入。这些模型仍然较为少见，这意味着大模型供应商尚未在定义API的"最佳"方式上达成标准。多模态输出则更为少见。

在LangChain中，大多数支持多模态输入的聊天模型也接受OpenAI内容块格式的这些值。目前这仅限于图像输入。对于支持视频和其他字节输入的模型，如Gemini，API也支持原生的、特定于模型的表示。

使用聊天模型调用工具

工具调用允许聊天模型通过"调用工具"来响应给定的提示。

请记住，虽然"工具调用"这个名称暗示模型直接执行某些操作，但实际上并非如此！模型仅生成工具的参数，实际运行工具（或不运行）取决于用户。

工具调用是一种通用技术，可以从模型生成结构化输出，即使您不打算调用任何工具也可以使用它。一个示例用例是从非结构化文本中提取。

如果工具调用包含在LLM响应中，它们将附加到相应的消息或消息块作为工具调用对象的列表，位于.tool_calls属性中。请注意，聊天模型可以同时调用多个工具。

一个 ToolCall 是一个包含工具名称、参数值字典和（可选）标识符的类型字典。没有工具调用的消息默认将此属性设置为空列表。

python 复制代码

query = "What is 3 * 12? Also, what is 11 + 49?"

llm_with_tools.invoke(query).tool_calls

[{'name': 'multiply',
  'args': {'a': 3, 'b': 12},
  'id': 'call_1fyhJAbJHuKQe6n0PacubGsL',
  'type': 'tool_call'},
 {'name': 'add',
  'args': {'a': 11, 'b': 49},
  'id': 'call_fc2jVkKzwuPWyU7kS9qn1hyG',
  'type': 'tool_call'}]

.tool_calls 属性应包含有效的工具调用。请注意，有时，大模型供应商可能会输出格式错误的工具调用（例如，参数不是有效的 JSON）。在这些情况下解析失败时， InvalidToolCall 的实例会填充在 .invalid_tool_calls 属性中。一个 InvalidToolCall 可以具有名称、字符串参数、标识符和错误消息。

如果需要，输出解析器可以进一步处理输出。例如，我们可以使用将.tool_calls中填充的现有值转换为Pydantic对象的 PydanticToolsParser：

python 复制代码

<!--IMPORTS:[{"imported": "PydanticToolsParser", "source": "langchain_core.output_parsers", "docs": "https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.openai_tools.PydanticToolsParser.html", "title": "How to use chat models to call tools"}]-->
from langchain_core.output_parsers import PydanticToolsParser
from pydantic import BaseModel, Field


class add(BaseModel):
    """Add two integers."""

    a: int = Field(..., description="First integer")
    b: int = Field(..., description="Second integer")


class multiply(BaseModel):
    """Multiply two integers."""

    a: int = Field(..., description="First integer")
    b: int = Field(..., description="Second integer")


chain = llm_with_tools | PydanticToolsParser(tools=[add, multiply])
chain.invoke(query)

python 复制代码

[multiply(a=3, b=12), add(a=11, b=49)]

从模型返回结构化数据

通常，模型返回符合特定模式的输出是非常有用的。一个常见的用例是从文本中提取数据以插入数据库或与其他下游系统一起使用。本指南涵盖了从模型获取结构化输出的一些策略。

.with_structured_output() 方法

这是获取结构化输出最简单和最可靠的方法。with_structured_output() 针对提供结构化输出的原生 API 的模型实现，例如工具/函数调用或 JSON 模式，并在底层利用这些功能。

此方法接受一个模式作为输入，该模式指定所需输出属性的名称、类型和描述。该方法返回一个类似模型的可运行对象，除了输出字符串或消息外，它输出与给定模式对应的对象。模式可以指定为 TypedDict 类、JSON Schema 或 Pydantic 类。如果使用 TypedDict 或 JSON Schema，则可运行对象将返回一个字典；如果使用 Pydantic 类，则将返回一个 Pydantic 对象。

作为一个例子，让我们让模型生成一个笑话，并将设置与笑点分开!

pydantic

如果我们希望模型返回一个 Pydantic 对象，我们只需传入所需的 Pydantic 类。使用 Pydantic 的主要优点是模型生成的输出将会被验证。如果缺少任何必需字段或字段类型错误，Pydantic 将引发错误。

python 复制代码

from typing import Optional

from pydantic import BaseModel, Field


# Pydantic
class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(
        default=None, description="How funny the joke is, from 1 to 10"
    )


structured_llm = llm.with_structured_output(Joke)

structured_llm.invoke("Tell me a joke about cats")

python 复制代码

Joke(setup='Why was the cat sitting on the computer?', punchline='Because it wanted to keep an eye on the mouse!', rating=7)

除了 Pydantic 类的结构，Pydantic类的名称、文档字符串以及参数的名称和提供的描述也非常重要。大多数情况下，with_structured_output使用的是模型的函数/工具调用 API，您可以有效地将所有这些信息视为添加到模型提示中。

TypedDict 或 JSON Schema

如果您不想使用 Pydantic，明确不想对参数进行验证，或者希望能够流式处理模型输出，您可以使用 TypedDict 类定义您的模式。

我们可以选择性地使用 LangChain 支持的特殊 Annotated 语法，允许您指定字段的默认值和描述。

请注意，如果模型没有生成默认值，则默认值不会自动填充，它仅用于定义传递给模型的模式。

python 复制代码

from typing_extensions import Annotated, TypedDict


# TypedDict
class Joke(TypedDict):
    """Joke to tell user."""

    setup: Annotated[str, ..., "The setup of the joke"]

    # Alternatively, we could have specified setup as:

    # setup: str                    # no default, no description
    # setup: Annotated[str, ...]    # no default, no description
    # setup: Annotated[str, "foo"]  # default, no description

    punchline: Annotated[str, ..., "The punchline of the joke"]
    rating: Annotated[Optional[int], None, "How funny the joke is, from 1 to 10"]


structured_llm = llm.with_structured_output(Joke)

structured_llm.invoke("Tell me a joke about cats")

python 复制代码

{'setup': 'Why was the cat sitting on the computer?',
 'punchline': 'Because it wanted to keep an eye on the mouse!',
 'rating': 7}

同样，我们可以传入一个 JSON Schema 字典。这不需要任何导入或类，并且清楚地说明了每个参数的文档，代价是稍微冗长一些。

python 复制代码

json_schema = {
    "title": "joke",
    "description": "Joke to tell user.",
    "type": "object",
    "properties": {
        "setup": {
            "type": "string",
            "description": "The setup of the joke",
        },
        "punchline": {
            "type": "string",
            "description": "The punchline to the joke",
        },
        "rating": {
            "type": "integer",
            "description": "How funny the joke is, from 1 to 10",
            "default": None,
        },
    },
    "required": ["setup", "punchline"],
}
structured_llm = llm.with_structured_output(json_schema)

structured_llm.invoke("Tell me a joke about cats")

python 复制代码

{'setup': 'Why was the cat sitting on the computer?',
 'punchline': 'Because it wanted to keep an eye on the mouse!',
 'rating': 7}

在多个模式之间选择

让模型从多个模式中选择的最简单方法是创建一个具有联合类型属性的父模式：

python 复制代码

from typing import Union


# Pydantic
class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(
        default=None, description="How funny the joke is, from 1 to 10"
    )


class ConversationalResponse(BaseModel):
    """Respond in a conversational manner. Be kind and helpful."""

    response: str = Field(description="A conversational response to the user's query")


class FinalResponse(BaseModel):
    final_output: Union[Joke, ConversationalResponse]


structured_llm = llm.with_structured_output(FinalResponse)

structured_llm.invoke("Tell me a joke about cats")

python 复制代码

FinalResponse(final_output=Joke(setup='Why was the cat sitting on the computer?', punchline='Because it wanted to keep an eye on the mouse!', rating=7))

python 复制代码

structured_llm.invoke("How are you today?")

python 复制代码

FinalResponse(final_output=ConversationalResponse(response="I'm just a bunch of code, so I don't have feelings, but I'm here and ready to help you! How can I assist you today?"))

或者，您可以直接使用工具调用，让模型在选项之间进行选择，如果您的选择的模型支持它。

流式处理

当输出类型为字典时（即，当模式被指定为TypedDict类或JSON Schema字典时），我们可以从我们的结构化模型中流式输出。

python 复制代码

from typing_extensions import Annotated, TypedDict


# TypedDict
class Joke(TypedDict):
    """Joke to tell user."""

    setup: Annotated[str, ..., "The setup of the joke"]
    punchline: Annotated[str, ..., "The punchline of the joke"]
    rating: Annotated[Optional[int], None, "How funny the joke is, from 1 to 10"]


structured_llm = llm.with_structured_output(Joke)

for chunk in structured_llm.stream("Tell me a joke about cats"):
    print(chunk)

python 复制代码

{}
{'setup': ''}
{'setup': 'Why'}
{'setup': 'Why was'}
{'setup': 'Why was the'}
{'setup': 'Why was the cat'}
{'setup': 'Why was the cat sitting'}
{'setup': 'Why was the cat sitting on'}
{'setup': 'Why was the cat sitting on the'}
{'setup': 'Why was the cat sitting on the computer'}
{'setup': 'Why was the cat sitting on the computer?'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': ''}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse!'}
{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse!', 'rating': 7}

少量示例提示

对于更复杂的模式，将少量示例添加到提示中非常有用。这可以通过几种方式实现。

最简单和最通用的方法是将示例添加到提示中的系统消息中：

python 复制代码

from langchain_core.prompts import ChatPromptTemplate

system = """You are a hilarious comedian. Your specialty is knock-knock jokes. \
Return a joke which has the setup (the response to "Who's there?") and the final punchline (the response to "<setup> who?").

Here are some examples of jokes:

example_user: Tell me a joke about planes
example_assistant: {{"setup": "Why don't planes ever get tired?", "punchline": "Because they have rest wings!", "rating": 2}}

example_user: Tell me another joke about planes
example_assistant: {{"setup": "Cargo", "punchline": "Cargo 'vroom vroom', but planes go 'zoom zoom'!", "rating": 10}}

example_user: Now about caterpillars
example_assistant: {{"setup": "Caterpillar", "punchline": "Caterpillar really slow, but watch me turn into a butterfly and steal the show!", "rating": 5}}"""

prompt = ChatPromptTemplate.from_messages([("system", system), ("human", "{input}")])

few_shot_structured_llm = prompt | structured_llm
few_shot_structured_llm.invoke("what's something funny about woodpeckers")

python 复制代码

{'setup': 'Woodpecker',
 'punchline': "Woodpecker who? Woodpecker who can't find a tree is just a bird with a headache!",
 'rating': 7}

当结构化输出的底层方法是工具调用时，我们可以将示例作为显式工具调用传入。

python 复制代码

from langchain_core.messages import AIMessage, HumanMessage, ToolMessage

examples = [
    HumanMessage("Tell me a joke about planes", name="example_user"),
    AIMessage(
        "",
        name="example_assistant",
        tool_calls=[
            {
                "name": "joke",
                "args": {
                    "setup": "Why don't planes ever get tired?",
                    "punchline": "Because they have rest wings!",
                    "rating": 2,
                },
                "id": "1",
            }
        ],
    ),
    # Most tool-calling models expect a ToolMessage(s) to follow an AIMessage with tool calls.
    ToolMessage("", tool_call_id="1"),
    # Some models also expect an AIMessage to follow any ToolMessages,
    # so you may need to add an AIMessage here.
    HumanMessage("Tell me another joke about planes", name="example_user"),
    AIMessage(
        "",
        name="example_assistant",
        tool_calls=[
            {
                "name": "joke",
                "args": {
                    "setup": "Cargo",
                    "punchline": "Cargo 'vroom vroom', but planes go 'zoom zoom'!",
                    "rating": 10,
                },
                "id": "2",
            }
        ],
    ),
    ToolMessage("", tool_call_id="2"),
    HumanMessage("Now about caterpillars", name="example_user"),
    AIMessage(
        "",
        tool_calls=[
            {
                "name": "joke",
                "args": {
                    "setup": "Caterpillar",
                    "punchline": "Caterpillar really slow, but watch me turn into a butterfly and steal the show!",
                    "rating": 5,
                },
                "id": "3",
            }
        ],
    ),
    ToolMessage("", tool_call_id="3"),
]
system = """You are a hilarious comedian. Your specialty is knock-knock jokes. \
Return a joke which has the setup (the response to "Who's there?") \
and the final punchline (the response to "<setup> who?")."""

prompt = ChatPromptTemplate.from_messages(
    [("system", system), ("placeholder", "{examples}"), ("human", "{input}")]
)
few_shot_structured_llm = prompt | structured_llm
few_shot_structured_llm.invoke({"input": "crocodiles", "examples": examples})

python 复制代码

{'setup': 'Crocodile',
 'punchline': 'Crocodile be seeing you later, alligator!',
 'rating': 7}

指定结构化输出的方法（高级）

对于支持多种结构化输出方式的模型（即，它们同时支持工具调用和JSON模式），您可以使用method=参数指定使用哪种方法。

如果使用JSON模式，您仍然需要在模型提示中指定所需的模式。您传递给with_structured_output的模式仅用于解析模型输出，而不会像工具调用那样传递给模型。

python 复制代码

structured_llm = llm.with_structured_output(None, method="json_mode")

structured_llm.invoke(
    "Tell me a joke about cats, respond in JSON with `setup` and `punchline` keys"
)

python 复制代码

{'setup': 'Why was the cat sitting on the computer?',
 'punchline': 'Because it wanted to keep an eye on the mouse!'}

原始输出（高级）

大型语言模型在生成结构化输出方面并不完美，尤其是当模式变得复杂时。您可以通过传递 include_raw=True 来避免引发异常并自行处理原始输出。这会将输出格式更改为包含原始消息输出、parsed 值（如果成功）以及任何结果错误：

python 复制代码

structured_llm = llm.with_structured_output(Joke, include_raw=True)

structured_llm.invoke("Tell me a joke about cats")

python 复制代码

{'raw': AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_f25ZRmh8u5vHlOWfTUw8sJFZ', 'function': {'arguments': '{"setup":"Why was the cat sitting on the computer?","punchline":"Because it wanted to keep an eye on the mouse!","rating":7}', 'name': 'Joke'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 33, 'prompt_tokens': 93, 'total_tokens': 126}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'stop', 'logprobs': None}, id='run-d880d7e2-df08-4e9e-ad92-dfc29f2fd52f-0', tool_calls=[{'name': 'Joke', 'args': {'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse!', 'rating': 7}, 'id': 'call_f25ZRmh8u5vHlOWfTUw8sJFZ', 'type': 'tool_call'}], usage_metadata={'input_tokens': 93, 'output_tokens': 33, 'total_tokens': 126}),
 'parsed': {'setup': 'Why was the cat sitting on the computer?',
  'punchline': 'Because it wanted to keep an eye on the mouse!',
  'rating': 7},
 'parsing_error': None}

直接提示和解析模型输出

并非所有模型都支持 .with_structured_output()，因为并非所有模型都具有工具调用或 JSON 模式支持。对于这些模型，您需要直接提示模型使用特定格式，并使用输出解析器从原始模型输出中提取结构化响应。\

使用 PydanticOutputParser

以下示例使用内置的 PydanticOutputParser 来解析被提示以匹配给定 Pydantic 模式的聊天模型的输出。请注意，我们直接将 format_instructions 添加到解析器的方法中的提示中：

python 复制代码

from typing import List

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field


class Person(BaseModel):
    """Information about a person."""

    name: str = Field(..., description="The name of the person")
    height_in_meters: float = Field(
        ..., description="The height of the person expressed in meters."
    )


class People(BaseModel):
    """Identifying information about all people in a text."""

    people: List[Person]


# Set up a parser
parser = PydanticOutputParser(pydantic_object=People)

# Prompt
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer the user query. Wrap the output in `json` tags\n{format_instructions}",
        ),
        ("human", "{query}"),
    ]
).partial(format_instructions=parser.get_format_instructions())

让我们看看发送到模型的信息：

python 复制代码

query = "Anna is 23 years old and she is 6 feet tall"

print(prompt.invoke(query).to_string())

python 复制代码

System: Answer the user query. Wrap the output in `json` tags
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
\`\`\`
{"description": "Identifying information about all people in a text.", "properties": {"people": {"title": "People", "type": "array", "items": {"$ref": "#/definitions/Person"}}}, "required": ["people"], "definitions": {"Person": {"title": "Person", "description": "Information about a person.", "type": "object", "properties": {"name": {"title": "Name", "description": "The name of the person", "type": "string"}, "height_in_meters": {"title": "Height In Meters", "description": "The height of the person expressed in meters.", "type": "number"}}, "required": ["name", "height_in_meters"]}}}
\`\`\`
Human: Anna is 23 years old and she is 6 feet tall

现在让我们调用它：

python 复制代码

chain = prompt | llm | parser

chain.invoke({"query": query})

python 复制代码

People(people=[Person(name='Anna', height_in_meters=1.8288)])

自定义解析

您还可以使用 LangChain表达式 (LCEL) 创建自定义提示和解析器，使用普通函数解析模型的输出：

python 复制代码

import json
import re
from typing import List

from langchain_core.messages import AIMessage
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field


class Person(BaseModel):
    """Information about a person."""

    name: str = Field(..., description="The name of the person")
    height_in_meters: float = Field(
        ..., description="The height of the person expressed in meters."
    )


class People(BaseModel):
    """Identifying information about all people in a text."""

    people: List[Person]


# Prompt
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Answer the user query. Output your answer as JSON that  "
            "matches the given schema: \`\`\`json\n{schema}\n\`\`\`. "
            "Make sure to wrap the answer in \`\`\`json and \`\`\` tags",
        ),
        ("human", "{query}"),
    ]
).partial(schema=People.schema())


# Custom parser
def extract_json(message: AIMessage) -> List[dict]:
    """Extracts JSON content from a string where JSON is embedded between \`\`\`json and \`\`\` tags.

    Parameters:
        text (str): The text containing the JSON content.

    Returns:
        list: A list of extracted JSON strings.
    """
    text = message.content
    # Define the regular expression pattern to match JSON blocks
    pattern = r"\`\`\`json(.*?)\`\`\`"

    # Find all non-overlapping matches of the pattern in the string
    matches = re.findall(pattern, text, re.DOTALL)

    # Return the list of matched JSON strings, stripping any leading or trailing whitespace
    try:
        return [json.loads(match.strip()) for match in matches]
    except Exception:
        raise ValueError(f"Failed to parse: {message}")

这是发送给模型的提示：

python 复制代码

query = "Anna is 23 years old and she is 6 feet tall"

print(prompt.format_prompt(query=query).to_string())

python 复制代码

System: Answer the user query. Output your answer as JSON that  matches the given schema: \`\`\`json
{'title': 'People', 'description': 'Identifying information about all people in a text.', 'type': 'object', 'properties': {'people': {'title': 'People', 'type': 'array', 'items': {'$ref': '#/definitions/Person'}}}, 'required': ['people'], 'definitions': {'Person': {'title': 'Person', 'description': 'Information about a person.', 'type': 'object', 'properties': {'name': {'title': 'Name', 'description': 'The name of the person', 'type': 'string'}, 'height_in_meters': {'title': 'Height In Meters', 'description': 'The height of the person expressed in meters.', 'type': 'number'}}, 'required': ['name', 'height_in_meters']}}}
\`\`\`. Make sure to wrap the answer in \`\`\`json and \`\`\` tags
Human: Anna is 23 years old and she is 6 feet tall

当我们调用它时，它的样子是这样的：

python 复制代码

chain = prompt | llm | extract_json

chain.invoke({"query": query})

python 复制代码

[{'people': [{'name': 'Anna', 'height_in_meters': 1.8288}]}]

缓存聊天模型的响应

LangChain 为聊天模型提供了一个可选的缓存层。这主要有两个好处：

如果您经常请求相同的完成，这可以通过减少您对大模型供应商的 API 调用次数来节省您的费用。这在应用开发期间尤其有用。
通过减少您对大模型供应商的 API 调用次数，它可以加快您的应用程序速度。

python 复制代码

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI
# <!-- ruff: noqa: F821 -->
from langchain_core.globals import set_llm_cache

llm = ChatOpenAI(model="gpt-4o-mini")

内存缓存

这是一个临时缓存，用于在内存中存储模型调用。当您的环境重启时，它将被清除，并且在进程之间不共享。

python 复制代码

from langchain_core.caches import InMemoryCache

set_llm_cache(InMemoryCache())

# The first time, it is not yet in cache, so it should take longer
llm.invoke("Tell me a joke")

python 复制代码

CPU times: user 645 ms, sys: 214 ms, total: 859 ms
Wall time: 829 ms

python 复制代码

AIMessage(content="Why don't scientists trust atoms?\n\nBecause they make up everything!", response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 11, 'total_tokens': 24}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None}, id='run-b6836bdd-8c30-436b-828f-0ac5fc9ab50e-0')

python 复制代码

# The second time it is, so it goes faster
llm.invoke("Tell me a joke")

python 复制代码

CPU times: user 822 µs, sys: 288 µs, total: 1.11 ms
Wall time: 1.06 ms

python 复制代码

AIMessage(content="Why don't scientists trust atoms?\n\nBecause they make up everything!", response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 11, 'total_tokens': 24}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None}, id='run-b6836bdd-8c30-436b-828f-0ac5fc9ab50e-0')

SQLite 缓存

此缓存实现使用 SQLite 数据库来存储响应，并且在进程重启时仍然有效。

python 复制代码

# We can do the same thing with a SQLite cache
from langchain_community.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))

# The first time, it is not yet in cache, so it should take longer
llm.invoke("Tell me a joke")

python 复制代码

CPU times: user 9.91 ms, sys: 7.68 ms, total: 17.6 ms
Wall time: 657 ms

python 复制代码

AIMessage(content='Why did the scarecrow win an award? Because he was outstanding in his field!', response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 11, 'total_tokens': 28}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None}, id='run-39d9e1e8-7766-4970-b1d8-f50213fd94c5-0')

python 复制代码

# The second time it is, so it goes faster
llm.invoke("Tell me a joke")

python 复制代码

CPU times: user 52.2 ms, sys: 60.5 ms, total: 113 ms
Wall time: 127 ms

python 复制代码

AIMessage(content='Why did the scarecrow win an award? Because he was outstanding in his field!', id='run-39d9e1e8-7766-4970-b1d8-f50213fd94c5-0')

获取日志概率

某些聊天模型可以配置为返回表示给定令牌可能性的令牌级日志概率。

为了让OpenAI API返回日志概率，我们需要配置logprobs=True参数。然后，日志概率将作为response_metadata的一部分包含在每个输出的AIMessage中：

python 复制代码

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini").bind(logprobs=True)

msg = llm.invoke(("human", "how are you today"))

msg.response_metadata["logprobs"]["content"][:5]

python 复制代码

[{'token': 'I', 'bytes': [73], 'logprob': -0.26341408, 'top_logprobs': []},
 {'token': "'m",
  'bytes': [39, 109],
  'logprob': -0.48584133,
  'top_logprobs': []},
 {'token': ' just',
  'bytes': [32, 106, 117, 115, 116],
  'logprob': -0.23484154,
  'top_logprobs': []},
 {'token': ' a',
  'bytes': [32, 97],
  'logprob': -0.0018291725,
  'top_logprobs': []},
 {'token': ' computer',
  'bytes': [32, 99, 111, 109, 112, 117, 116, 101, 114],
  'logprob': -0.052299336,
  'top_logprobs': []}]

并且也作为流式消息块的一部分：

python 复制代码

ct = 0
full = None
for chunk in llm.stream(("human", "how are you today")):
    if ct < 5:
        full = chunk if full is None else full + chunk
        if "logprobs" in full.response_metadata:
            print(full.response_metadata["logprobs"]["content"])
    else:
        break
    ct += 1

[]
[{'token': 'I', 'bytes': [73], 'logprob': -0.26593843, 'top_logprobs': []}]
[{'token': 'I', 'bytes': [73], 'logprob': -0.26593843, 'top_logprobs': []}, {'token': "'m", 'bytes': [39, 109], 'logprob': -0.3238896, 'top_logprobs': []}]
[{'token': 'I', 'bytes': [73], 'logprob': -0.26593843, 'top_logprobs': []}, {'token': "'m", 'bytes': [39, 109], 'logprob': -0.3238896, 'top_logprobs': []}, {'token': ' just', 'bytes': [32, 106, 117, 115, 116], 'logprob': -0.23778509, 'top_logprobs': []}]
[{'token': 'I', 'bytes': [73], 'logprob': -0.26593843, 'top_logprobs': []}, {'token': "'m", 'bytes': [39, 109], 'logprob': -0.3238896, 'top_logprobs': []}, {'token': ' just', 'bytes': [32, 106, 117, 115, 116], 'logprob': -0.23778509, 'top_logprobs': []}, {'token': ' a', 'bytes': [32, 97], 'logprob': -0.0022134194, 'top_logprobs': []}]

创建自定义聊天模型类

使用标准BaseChatModel接口包装您的大型语言模型（LLM），可以让您在现有的LangChain程序中以最小的代码修改使用您的LLM！

作为额外好处，您的LLM将自动成为LangChain Runnable，并将受益于一些开箱即用的优化（例如，通过线程池批处理）、异步支持、astream_events API等。

聊天模型的输入和输出都是通过消息传递的。

LangChain 提供了多种内置消息类型：

消息类型	描述
`SystemMessage`	引导 AI 行为，通常作为输入序列中的第一条消息。
`HumanMessage`	表示人类用户发送给模型的消息。
`AIMessage`	表示来自 AI 模型的回复，可为文本或工具调用请求。
`FunctionMessage`	工具调用结果返回给模型时使用的消息。
`ToolMessage`	同上，紧贴 OpenAI 的 `tool` 角色设计。
`AIMessageChunk` 等	各种消息类型的流式变体（Chunk），用于流式输出场景。

💡 ToolMessage 与 FunctionMessage 紧密遵循 OpenAI 的 function 与 tool 角色定义。

python 复制代码

from langchain_core.messages import (
    AIMessage,
    BaseMessage,
    FunctionMessage,
    HumanMessage,
    SystemMessage,
    ToolMessage,
)

每种消息类型都有一个对应的"块"版本，用于流式输出：

python 复制代码

from langchain_core.messages import (
    AIMessageChunk,
    FunctionMessageChunk,
    HumanMessageChunk,
    SystemMessageChunk,
    ToolMessageChunk,
)

这些块在从聊天模型流式输出时使用，它们都定义了一个附加属性！这些 Chunk 消息对象支持拼接：

python 复制代码

AIMessageChunk(content="Hello") + AIMessageChunk(content=" World!")
# => AIMessageChunk(content="Hello World!")

它们在从模型分段接收内容时非常有用，比如实现逐步生成、流式响应等功能。

我们将继承 BaseChatModel，创建一个自定义模型，返回提示中最后一条消息的前 n 个字符。

✅ 需要实现的方法/属性

方法 / 属性	描述	必需性
`_generate`	生成聊天响应的核心逻辑	✅ 必需
`_llm_type`	模型类型字符串，用于日志追踪	✅ 必需
`_identifying_params`	模型参数，供回调追踪使用	⭕ 可选
`_stream`	支持输出流式响应	⭕ 可选
`_agenerate`	原生异步 `_generate` 实现	⭕ 可选
`_astream`	原生异步 `_stream` 实现	⭕ 可选

💡 _astream 默认使用 run_in_executor 包装 _stream，若无 _stream 则回退至 _agenerate。建议实现原生异步逻辑以提升性能。

python 复制代码

from typing import Any, AsyncIterator, Dict, Iterator, List, Optional

from langchain_core.callbacks import (
    AsyncCallbackManagerForLLMRun,
    CallbackManagerForLLMRun,
)
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import (
    AIMessage,
    AIMessageChunk,
    BaseMessage,
    HumanMessage,
)
from langchain_core.outputs import (
    ChatGeneration,
    ChatGenerationChunk,
    ChatResult,
)
from langchain_core.runnables import run_in_executor

📄 模型实现

python 复制代码

class CustomChatModelAdvanced(BaseChatModel):
    model_name: str
    n: int

    def _generate(self, messages: List[BaseMessage], stop=None, run_manager=None, **kwargs) -> ChatResult:
        last_message = messages[-1]
        tokens = last_message.content[:self.n]
        message = AIMessage(
            content=tokens,
            additional_kwargs={},
            response_metadata={"time_in_seconds": 3},
        )
        return ChatResult(generations=[ChatGeneration(message=message)])

    def _stream(self, messages: List[BaseMessage], stop=None, run_manager=None, **kwargs) -> Iterator[ChatGenerationChunk]:
        last_message = messages[-1]
        tokens = last_message.content[:self.n]
        for token in tokens:
            chunk = ChatGenerationChunk(message=AIMessageChunk(content=token))
            if run_manager:
                run_manager.on_llm_new_token(token, chunk=chunk)
            yield chunk
        yield ChatGenerationChunk(
            message=AIMessageChunk(content="", response_metadata={"time_in_sec": 3})
        )

    @property
    def _llm_type(self) -> str:
        return "echoing-chat-model-advanced"

    @property
    def _identifying_params(self) -> Dict[str, Any]:
        return {"model_name": self.model_name}

🧪 测试用例

python 复制代码

model = CustomChatModelAdvanced(n=3, model_name="my_custom_model")

# 单条消息
model.invoke([
    HumanMessage(content="hello!"),
    AIMessage(content="Hi there human!"),
    HumanMessage(content="Meow!"),
])
# 输出: AIMessage(content='Meo', ...)

# 简化输入形式
model.invoke("hello")
# 输出: AIMessage(content='hel', ...)

# 批处理
model.batch(["hello", "goodbye"])
# 输出:
# [AIMessage(content='hel', ...),
#  AIMessage(content='goo', ...)]

# 流式输出
for chunk in model.stream("cat"):
    print(chunk.content, end="|")
# 输出: c|a|t||

⚡ 异步流式输出

python 复制代码

async for chunk in model.astream("cat"):
    print(chunk.content, end="|")
# 输出: c|a|t||

🎯 回调事件检查

python 复制代码

async for event in model.astream_events("cat", version="v1"):
    print(event)

示例输出（部分）：

json 复制代码

{"event": "on_chat_model_start", ...}
{"event": "on_chat_model_stream", "data": {"chunk": AIMessageChunk(content='c')}}
{"event": "on_chat_model_stream", "data": {"chunk": AIMessageChunk(content='a')}}
{"event": "on_chat_model_stream", "data": {"chunk": AIMessageChunk(content='t')}}
{"event": "on_chat_model_end", "data": {"output": AIMessageChunk(content='cat')}}

流式传输聊天模型响应

所有聊天模型都实现了 LangChain 的 运行接口（Runnable Interface） ，它提供以下标准方法的默认实现：

ainvoke
batch
abatch
stream
astream
astream_events

默认流式实现会返回一个 Iterator 或 AsyncIterator，产生 单一值：即模型生成的最终输出（非逐个 token）。

💡 默认实现不支持逐 token 流，但由于接口标准统一，因此模型之间可轻松替换。

🔁 同步流式输出示例

使用 | 分隔每个生成的 token ：

python 复制代码

from langchain_anthropic.chat_models import ChatAnthropic

chat = ChatAnthropic(model="claude-3-haiku-20240307")

for chunk in chat.stream("Write me a 1 verse song about goldfish on the moon"):
    print(chunk.content, end="|", flush=True)

输出示例：

复制代码

Here| is| a| |1| verse| song| about| gol|dfish| on| the| moon|:|
Floating| up| in| the| star|ry| night|,|
Fins| a|-|gl|im|mer| in| the| pale| moon|light|.|
Gol|dfish| swimming|,| peaceful| an|d free|,|
Se|ren|ely| drif|ting| across| the| lunar| sea|.|

⚡ 异步流式输出示例

python 复制代码

from langchain_anthropic.chat_models import ChatAnthropic

chat = ChatAnthropic(model="claude-3-haiku-20240307")

async for chunk in chat.astream("Write me a 1 verse song about goldfish on the moon"):
    print(chunk.content, end="|", flush=True)

输出示例：

复制代码

Here| is| a| |1| verse| song| about| gol|dfish| on| the| moon|:|
Floating| up| above| the| Earth|,|
Gol|dfish| swim| in| alien| m|irth|.|
In| their| bowl| of| lunar| dust|,|
Gl|it|tering| scales| reflect| the| trust|
Of| swimming| free| in| this| new| worl|d,|
Where| their| aqu|atic| dream|'s| unf|ur|le|d.|

📡 Astream 事件监听

可以使用 astream_events 方法监听每个阶段的事件，非常适合构建包含多个步骤的 LLM 应用（如链式调用）：

python 复制代码

from langchain_anthropic.chat_models import ChatAnthropic

chat = ChatAnthropic(model="claude-3-haiku-20240307")
idx = 0

async for event in chat.astream_events(
    "Write me a 1 verse song about goldfish on the moon", version="v1"
):
    idx += 1
    if idx >= 5:
        print("...Truncated")
        break
    print(event)

部分输出示例：

json 复制代码

{
  "event": "on_chat_model_start",
  "data": { "input": "Write me a 1 verse song about goldfish on the moon" }
}
{
  "event": "on_chat_model_stream",
  "data": { "chunk": AIMessageChunk(content='Here') }
}
{
  "event": "on_chat_model_stream",
  "data": { "chunk": AIMessageChunk(content="'s") }
}
{
  "event": "on_chat_model_stream",
  "data": { "chunk": AIMessageChunk(content=' a') }
}
...Truncated

跟踪聊天模型中的令牌使用情况

跟踪令牌使用情况以计算成本是将您的应用投入生产的重要部分。

一些大模型供应商在聊天生成响应中返回令牌使用信息。当可用时，这些信息将包含在相应模型生成的 AIMessage 对象中。

LangChain 的 AIMessage 对象包含一个 usage_metadata 属性。当填充时，该属性将是一个具有标准键（例如，"input_tokens" 和 "output_tokens"）的 UsageMetadata 字典。

示例OpenAI：

python 复制代码

<!--IMPORTS:[{"imported": "ChatOpenAI", "source": "langchain_openai", "docs": "https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html", "title": "How to track token usage in ChatModels"}]-->
# # !pip install -qU langchain-openai

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")
openai_response = llm.invoke("hello")
openai_response.usage_metadata

{'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17}

模型响应中的元数据也包含在 AIMessage 的 response_metadata 属性中。这些数据通常不是标准化的。请注意，不同的大模型供应商采用不同的约定来表示令牌计数：

python 复制代码

print(f'OpenAI: {openai_response.response_metadata["token_usage"]}\n')
print(f'Anthropic: {anthropic_response.response_metadata["usage"]}')

OpenAI: {'completion_tokens': 9, 'prompt_tokens': 8, 'total_tokens': 17}

Anthropic: {'input_tokens': 8, 'output_tokens': 12}

流式处理

某些大模型供应商在流式上下文中支持令牌计数元数据。

例如，OpenAI将在流结束时返回一条消息 chunk，其中包含令牌使用信息。此行为由 langchain-openai >= 0.1.9 支持，并可以通过设置 stream_usage=True 来启用。此属性在实例化 ChatOpenAI 时也可以设置。

默认情况下，流中的最后一条消息块将在消息的 response_metadata 属性中包含一个

"finish_reason"。如果我们在流式模式中包含令牌使用信息，则将在流的末尾添加一个包含使用元数据的额外块，使得"finish_reason" 出现在倒数第二条消息块上。

python 复制代码

llm = ChatOpenAI(model="gpt-4o-mini")

aggregate = None
for chunk in llm.stream("hello", stream_usage=True):
    print(chunk)
    aggregate = chunk if aggregate is None else aggregate + chunk

python 复制代码

content='' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content='Hello' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content='!' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content=' How' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content=' can' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content=' I' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content=' assist' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content=' you' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content=' today' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content='?' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content='' response_metadata={'finish_reason': 'stop', 'model_name': 'gpt-4o-mini'} id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623'
content='' id='run-adb20c31-60c7-43a2-99b2-d4a53ca5f623' usage_metadata={'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17}

请注意，使用元数据将包含在各个消息块的总和中：

python 复制代码

print(aggregate.content)
print(aggregate.usage_metadata)

python 复制代码

Hello! How can I assist you today?
{'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17}

要禁用 OpenAI 的流式令牌计数，请将 stream_usage 设置为 False，或从参数中省略它：

python 复制代码

aggregate = None
for chunk in llm.stream("hello"):
    print(chunk)

python 复制代码

content='' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content='Hello' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content='!' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content=' How' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content=' can' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content=' I' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content=' assist' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content=' you' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content=' today' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content='?' id='run-8e758550-94b0-4cca-a298-57482793c25d'
content='' response_metadata={'finish_reason': 'stop', 'model_name': 'gpt-4o-mini'} id='run-8e758550-94b0-4cca-a298-57482793c25d'

您还可以通过在实例化聊天模型时设置 stream_usage 来启用流式令牌使用。这在将聊天模型纳入 LangChain 链时非常有用：可以在流式中间步骤或使用诸如 LangSmith 的跟踪软件时监控使用元数据。

请参见下面的示例，我们返回结构化为所需模式的输出，但仍然可以观察到从中间步骤流式传输的令牌使用情况。

python 复制代码

from pydantic import BaseModel, Field


class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")


llm = ChatOpenAI(
    model="gpt-4o-mini",
    stream_usage=True,
)
# Under the hood, .with_structured_output binds tools to the
# chat model and appends a parser.
structured_llm = llm.with_structured_output(Joke)

async for event in structured_llm.astream_events("Tell me a joke", version="v2"):
    if event["event"] == "on_chat_model_end":
        print(f'Token usage: {event["data"]["output"].usage_metadata}\n')
    elif event["event"] == "on_chain_end":
        print(event["data"]["output"])
    else:
        pass

python 复制代码

Token usage: {'input_tokens': 79, 'output_tokens': 23, 'total_tokens': 102}

setup='Why was the math book sad?' punchline='Because it had too many problems.'

令牌使用情况在来自聊天模型的有效负载中也可以在相应的 LangSmith 跟踪中看到。

使用回调

还有一些特定于API的回调上下文管理器，可以让您跟踪多个调用中的令牌使用情况。目前仅在OpenAI API和Bedrock Anthropic API中实现。

让我们首先看一个极其简单的示例，跟踪单个聊天模型调用的令牌使用情况。

python 复制代码

<!--IMPORTS:[{"imported": "get_openai_callback", "source": "langchain_community.callbacks.manager", "docs": "https://python.langchain.com/api_reference/community/callbacks/langchain_community.callbacks.manager.get_openai_callback.html", "title": "How to track token usage in ChatModels"}]-->
# !pip install -qU langchain-community wikipedia

from langchain_community.callbacks.manager import get_openai_callback

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    stream_usage=True,
)

with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    print(cb)

python 复制代码

Tokens Used: 27
	Prompt Tokens: 11
	Completion Tokens: 16
Successful Requests: 1
Total Cost (USD): $2.95e-05

上下文管理器中的任何内容都会被跟踪。以下是使用它按顺序跟踪多个调用的示例。

python 复制代码

with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    result2 = llm.invoke("Tell me a joke")
    print(cb.total_tokens)

python 复制代码

with get_openai_callback() as cb:
    for chunk in llm.stream("Tell me a joke"):
        pass
    print(cb)

python 复制代码

Tokens Used: 27
	Prompt Tokens: 11
	Completion Tokens: 16
Successful Requests: 1
Total Cost (USD): $2.95e-05

如果使用了包含多个步骤的链或代理，它将跟踪所有这些步骤。

python 复制代码

<!--IMPORTS:[{"imported": "AgentExecutor", "source": "langchain.agents", "docs": "https://python.langchain.com/api_reference/langchain/agents/langchain.agents.agent.AgentExecutor.html", "title": "How to track token usage in ChatModels"}, {"imported": "create_tool_calling_agent", "source": "langchain.agents", "docs": "https://python.langchain.com/api_reference/langchain/agents/langchain.agents.tool_calling_agent.base.create_tool_calling_agent.html", "title": "How to track token usage in ChatModels"}, {"imported": "load_tools", "source": "langchain.agents", "docs": "https://python.langchain.com/api_reference/community/agent_toolkits/langchain_community.agent_toolkits.load_tools.load_tools.html", "title": "How to track token usage in ChatModels"}, {"imported": "ChatPromptTemplate", "source": "langchain_core.prompts", "docs": "https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html", "title": "How to track token usage in ChatModels"}]-->
from langchain.agents import AgentExecutor, create_tool_calling_agent, load_tools
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You're a helpful assistant"),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)
tools = load_tools(["wikipedia"])
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)


with get_openai_callback() as cb:
    response = agent_executor.invoke(
        {
            "input": "What's a hummingbird's scientific name and what's the fastest bird species?"
        }
    )
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")

处理速率限制

您可能会遇到由于请求过多而被大模型供应商的API限制速率的情况。

例如，如果您在测试数据集上对聊天模型进行基准测试时运行了许多并行查询，就可能会发生这种情况。

如果您面临这种情况，可以使用速率限制器来帮助将您的请求速率与API允许的速率相匹配。

LangChain内置了一个内存速率限制器。该速率限制器是线程安全的，可以被同一进程中的多个线程共享。

提供的速率限制器只能限制单位时间内的请求数量。如果您还需要根据请求的大小进行限制，它将无能为力。

python 复制代码

from langchain_core.rate_limiters import InMemoryRateLimiter

rate_limiter = InMemoryRateLimiter(
    requests_per_second=0.1,  # <-- Super slow! We can only make a request once every 10 seconds!!
    check_every_n_seconds=0.1,  # Wake up every 100 ms to check whether allowed to make a request,
    max_bucket_size=10,  # Controls the maximum burst size.
)

选择任何模型，并通过 rate_limiter 属性将速率限制器传递给它。

python 复制代码

import os
import time
from getpass import getpass

if "ANTHROPIC_API_KEY" not in os.environ:
    os.environ["ANTHROPIC_API_KEY"] = getpass()


from langchain_anthropic import ChatAnthropic

model = ChatAnthropic(model_name="claude-3-opus-20240229", rate_limiter=rate_limiter)

让我们确认速率限制器是否有效。我们每10秒只能调用模型一次。

python 复制代码

for _ in range(5):
    tic = time.time()
    model.invoke("hello")
    toc = time.time()
    print(toc - tic)

11.599073648452759
10.7502121925354
10.244257926940918
8.83088755607605
11.645203590393066

使用少量示例提示与工具调用

对于更复杂的工具使用，向提示中添加少量示例非常有用。我们可以通过向提示中添加带有 ToolCall 的 AIMessage 和相应的 ToolMessage 来实现。

首先，让我们定义我们的工具和模型。

python 复制代码

from langchain_core.tools import tool


@tool
def add(a: int, b: int) -> int:
    """Adds a and b."""
    return a + b


@tool
def multiply(a: int, b: int) -> int:
    """Multiplies a and b."""
    return a * b


tools = [add, multiply]

python 复制代码

import os
from getpass import getpass

from langchain_openai import ChatOpenAI

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass()

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools(tools)

让我们运行我们的模型，我们可以注意到即使有一些特殊指令，我们的模型也可能因为运算顺序而出错。

python 复制代码

llm_with_tools.invoke(
    "Whats 119 times 8 minus 20. Don't do any math yourself, only use tools for math. Respect order of operations"
).tool_calls

python 复制代码

[{'name': 'Multiply',
  'args': {'a': 119, 'b': 8},
  'id': 'call_T88XN6ECucTgbXXkyDeC2CQj'},
 {'name': 'Add',
  'args': {'a': 952, 'b': -20},
  'id': 'call_licdlmGsRqzup8rhqJSb1yZ4'}]

模型现在不应该尝试添加任何内容，因为它在技术上还无法知道119 * 8的结果。

通过添加一个带有一些示例的提示，我们可以纠正这种行为：

python 复制代码

<!--IMPORTS:[{"imported": "AIMessage", "source": "langchain_core.messages", "docs": "https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.AIMessage.html", "title": "How to use few-shot prompting with tool calling"}, {"imported": "HumanMessage", "source": "langchain_core.messages", "docs": "https://python.langchain.com/api_reference/core/messages/langchain_core.messages.human.HumanMessage.html", "title": "How to use few-shot prompting with tool calling"}, {"imported": "ToolMessage", "source": "langchain_core.messages", "docs": "https://python.langchain.com/api_reference/core/messages/langchain_core.messages.tool.ToolMessage.html", "title": "How to use few-shot prompting with tool calling"}, {"imported": "ChatPromptTemplate", "source": "langchain_core.prompts", "docs": "https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html", "title": "How to use few-shot prompting with tool calling"}, {"imported": "RunnablePassthrough", "source": "langchain_core.runnables", "docs": "https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.passthrough.RunnablePassthrough.html", "title": "How to use few-shot prompting with tool calling"}]-->
from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

examples = [
    HumanMessage(
        "What's the product of 317253 and 128472 plus four", name="example_user"
    ),
    AIMessage(
        "",
        name="example_assistant",
        tool_calls=[
            {"name": "Multiply", "args": {"x": 317253, "y": 128472}, "id": "1"}
        ],
    ),
    ToolMessage("16505054784", tool_call_id="1"),
    AIMessage(
        "",
        name="example_assistant",
        tool_calls=[{"name": "Add", "args": {"x": 16505054784, "y": 4}, "id": "2"}],
    ),
    ToolMessage("16505054788", tool_call_id="2"),
    AIMessage(
        "The product of 317253 and 128472 plus four is 16505054788",
        name="example_assistant",
    ),
]

system = """You are bad at math but are an expert at using a calculator. 

Use past tool usage as an example of how to correctly use the tools."""
few_shot_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        *examples,
        ("human", "{query}"),
    ]
)

chain = {"query": RunnablePassthrough()} | few_shot_prompt | llm_with_tools
chain.invoke("Whats 119 times 8 minus 20").tool_calls

这次我们得到了正确的输出。

python 复制代码

[{'name': 'Multiply',
  'args': {'a': 119, 'b': 8},
  'id': 'call_9MvuwQqg7dlJupJcoTWiEsDo'}]

绑定特定模型的工具

不同的大模型供应商对工具（Tool）架构格式有不同约定。

以 OpenAI 为例，其使用的格式如下：

字段	说明
`type`	工具的类型，固定为 `"function"`
`function`	对象，包含工具的详细定义
`function.name`	工具名称
`function.description`	工具的简要描述
`function.parameters`	工具参数定义，使用 JSON Schema 格式

你可以直接将上述工具架构格式绑定到模型，示例如下：

python 复制代码

from langchain_openai import ChatOpenAI

model = ChatOpenAI()

model_with_tools = model.bind(
    tools=[
        {
            "type": "function",
            "function": {
                "name": "multiply",
                "description": "Multiply two integers together.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "a": {"type": "number", "description": "First integer"},
                        "b": {"type": "number", "description": "Second integer"},
                    },
                    "required": ["a", "b"],
                },
            },
        }
    ]
)

response = model_with_tools.invoke("Whats 119 times 8?")

📥 示例响应内容（简化）

python 复制代码

AIMessage(
    content='',
    additional_kwargs={
        'tool_calls': [
            {
                'id': 'call_mn4ELw1NbuE0DFYhIeK0GrPe',
                'function': {
                    'name': 'multiply',
                    'arguments': '{"a":119,"b":8}'
                },
                'type': 'function'
            }
        ]
    },
    response_metadata={
        'token_usage': {'completion_tokens': 17, 'prompt_tokens': 62, 'total_tokens': 79},
        'model_name': 'gpt-3.5-turbo',
        'finish_reason': 'tool_calls'
    },
    tool_calls=[
        {
            'name': 'multiply',
            'args': {'a': 119, 'b': 8},
            'id': 'call_mn4ELw1NbuE0DFYhIeK0GrPe'
        }
    ]
)

📌 补充说明

此方法与 LangChain 提供的 .bind_tools() 方法在功能上等价。
在 LangChain 中使用此绑定工具，可以实现函数调用/插件式能力扩展。

在本地运行模型

像 llama.cpp 、Ollama 、GPT4All 、llamafile 等项目的流行，突显了在本地（您自己的设备上）运行大型语言模型（LLMs）的需求。

本地运行 LLM 至少有两个显著优势：

✅ 隐私：数据不会发送给第三方，也不受商业服务条款限制。
💰 成本：无推理费用，对高调用场景（如长上下文模拟、摘要生成）尤其重要。

本地部署一个 LLM 通常需要满足以下条件：

开源 LLM

一个可以自由修改和共享的大型语言模型，例如：
- LLaMA
- Mistral
- Falcon
- Gemma
推理能力

即：能够在本地设备上以可接受延迟运行该模型。

开源模型可从以下两个维度评估：

维度	说明
基础模型	模型结构、规模和训练语料
微调方法	是否使用指令调优（如 Alpaca、WizardLM）、是否具备对话能力

这些模型的相对性能可以通过几个排行榜进行评估，包括：

LmSys
GPT4All
HuggingFace

为支持开源 LLM 的本地运行，以下几个主流推理框架已经出现：

框架	描述
llama.cpp	用 C++ 实现的推理引擎，支持量化
gpt4all	用 C 优化的后端
Ollama	模型权重 + 环境打包为可执行服务
llamafile	打包模型与运行时为一个文件，直接运行，无需安装

这些工具通常具备以下能力：

📦 量化（Quantization）：压缩模型大小，减少内存占用
🚀 高效推理实现：优化运行效率，适配 CPU/GPU 推理

🔗 推荐阅读：一篇深入讲解量化重要性的文章

📉 为什么量化重要？

量化将原始 FP16/FP32 权重转换为 8bit、4bit 等格式：

⬇️ 显著降低显存占用
🧠 保持较高推理精度
🖥️ 允许模型运行于消费级设备（如笔记本 CPU/GPU）

💡 示例：由于 GPU 带宽更高，M2 Max 上运行比 M1 提速 5-6 倍。

🧾 本地模型的提示词格式

有些本地模型使用特殊提示词格式或标记。不同供应商的聊天模型封装器会为你自动处理，但如果你使用的是"纯文本输入/输出"的接口，你可能需要自行构造提示。

例如，LLaMA 2 对话模型提示格式如下：

复制代码

<s>[INST] 你能帮我总结一下这个段落吗？ [/INST]
当然可以，这是总结...

快速入门

Ollama 是一个可在 macOS 上轻松运行本地大型语言模型（LLM）的框架。

下载并运行 Ollama 应用程序
使用命令行拉取模型，例如：
bash 复制代码
```
ollama pull llama3.1:8b
```
启动应用后，模型将通过本地服务自动在 http://localhost:11434 提供。
安装 LangChain 的 Ollama 接口：
bash 复制代码
```
%pip install -qU langchain_ollama
```

🧠 使用 OllamaLLM 模型（纯文本接口）

python 复制代码

from langchain_ollama import OllamaLLM

llm = OllamaLLM(model="llama3.1:8b")

llm.invoke("The first man on the moon was ...")

🔁 返回示例：

复制代码

...Neil Armstrong!

On July 20, 1969, Neil Armstrong became the first person to set foot on the lunar surface, famously declaring "That's one small step for man, one giant leap for mankind" as he stepped off the lunar module Eagle onto the Moon's surface.

Would you like to know more about the Apollo 11 mission or Neil Armstrong's achievements?

🔄 流式输出（逐个令牌生成）

python 复制代码

for chunk in llm.stream("The first man on the moon was ..."):
    print(chunk, end="|", flush=True)

🔁 输出示例：

复制代码

Neil| Armstrong|,| an| American| astronaut|.| He| stepped| out| of| the| lunar| module| Eagle| and| onto| the| surface| of| the| Moon| on| July| 20|,| 1969|,| famously| declaring|:| "That's| one| small| step| for| man|,| one| giant| leap| for| mankind|"|

💬 使用 ChatOllama 聊天模型（支持对话格式）

python 复制代码

from langchain_ollama import ChatOllama

chat_model = ChatOllama(model="llama3.1:8b")

chat_model.invoke("Who was the first man on the moon?")

🔁 返回示例：

text 复制代码

The answer is a historic one!

The first man to walk on the Moon was Neil Armstrong, an American astronaut and commander of the Apollo 11 mission. On July 20, 1969, Armstrong stepped out of the lunar module Eagle onto the surface of the Moon, famously declaring:

"That's one small step for man, one giant leap for mankind."

Armstrong was followed by fellow astronaut Edwin "Buzz" Aldrin, who also walked on the Moon during the mission. Michael Collins remained in orbit around the Moon in the command module Columbia.

Neil Armstrong passed away on August 25, 2012, but his legacy as a pioneering astronaut and engineer continues to inspire people around the world!

🧾 附带元数据：

json 复制代码

{
  "model": "llama3.1:8b",
  "created_at": "2024-08-01T00:38:29.176717Z",
  "done_reason": "stop",
  "usage_metadata": {
    "input_tokens": 19,
    "output_tokens": 141,
    "total_tokens": 160
  }
}

环境

在本地运行模型时，推理速度是一个挑战（见上文）。

为了最小化延迟，最好在 GPU 上本地运行模型，许多消费类笔记本电脑都配备了 GPU 例如，Apple 设备。

Ollama 和 llamafile 将自动利用 Apple 设备上的 GPU。

其他框架要求用户设置环境以利用 Apple GPU。

例如，llama.cpp 的 Python 绑定可以通过 Metal 配置为使用 GPU。

Metal 是由 Apple 创建的图形和计算 API，提供对 GPU 的近乎直接访问。

特别是，确保conda正在使用您创建的正确虚拟环境（miniforge3）。

例如，对我来说：

python 复制代码

conda activate /Users/rlm/miniforge3/envs/llama

确认上述内容后，

python 复制代码

CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir

大型语言模型

有多种方法可以获取量化模型权重。

HuggingFace - 许多量化模型可供下载，并可以使用llama.cpp等框架运行。您还可以从HuggingFace下载llamafile格式的模型。
gpt4all - 模型浏览器提供了可下载的量化模型的指标排行榜。
Ollama - 可以通过pull直接访问多个模型。

Ollama

使用 Ollama，通过 ollama pull <模型系列>:<标签> 获取模型：

例如，对于 Llama 2 7b：ollama pull llama2 将下载模型的最基本版本（例如，最小的参数数量和 4 位量化）
我们还可以从模型列表指定特定版本，例如 ollama pull llama2:13b
请参阅 API 参考页面以获取完整的参数集。

python 复制代码

llm = OllamaLLM(model="llama2:13b")
llm.invoke("The first man on the moon was ... think step by step")

python 复制代码

' Sure! Here\'s the answer, broken down step by step:\n\nThe first man on the moon was... Neil Armstrong.\n\nHere\'s how I arrived at that answer:\n\n1. The first manned mission to land on the moon was Apollo 11.\n2. The mission included three astronauts: Neil Armstrong, Edwin "Buzz" Aldrin, and Michael Collins.\n3. Neil Armstrong was the mission commander and the first person to set foot on the moon.\n4. On July 20, 1969, Armstrong stepped out of the lunar module Eagle and onto the moon\'s surface, famously declaring "That\'s one small step for man, one giant leap for mankind."\n\nSo, the first man on the moon was Neil Armstrong!'

Llama.cpp

Llama.cpp 是一个高性能的本地推理框架，兼容广泛的开源大型语言模型（LLMs），尤其支持来自 HuggingFace 的各类量化模型（如 GGUF/GGML 格式）。

✅ 示例：运行 4-bit 量化的 LLaMA2-13B 模型

使用来自 HuggingFace 的 llama2-13b.gguf.q4_0 模型文件，结合 Llama.cpp 本地推理。

🔧 推荐参数配置说明（Llama.cpp API）

参数名	示例值	含义说明
`n_gpu_layers`	`1`	指定加载到 GPU 中的 transformer 层数。设置为 `1` 通常已足够，适合内存受限场景。
`n_batch`	`512`	每批并行处理的 token 数量。建议设为 `1` 到 `n_ctx` 之间。
`n_ctx`	`2048`	模型的上下文窗口大小，决定每次最大 token 输入范围。
`f16_kv`	`True`	是否使用半精度（FP16）存储 KV 缓存，能显著减少内存开销；Apple Metal 只支持 `True`。

📌 说明

参数如 n_batch 和 n_ctx 越大，模型并行性能越强，但对内存要求越高。
若启用了 Apple Metal 支持（如在 macOS 上），必须设置 f16_kv=True。

python 复制代码

%env CMAKE_ARGS="-DLLAMA_METAL=on"
%env FORCE_CMAKE=1
%pip install --upgrade --quiet  llama-cpp-python --no-cache-dirclear

python 复制代码

<!--IMPORTS:[{"imported": "LlamaCpp", "source": "langchain_community.llms", "docs": "https://python.langchain.com/api_reference/community/llms/langchain_community.llms.llamacpp.LlamaCpp.html", "title": "Run models locally"}, {"imported": "CallbackManager", "source": "langchain_core.callbacks", "docs": "https://python.langchain.com/api_reference/core/callbacks/langchain_core.callbacks.manager.CallbackManager.html", "title": "Run models locally"}, {"imported": "StreamingStdOutCallbackHandler", "source": "langchain_core.callbacks", "docs": "https://python.langchain.com/api_reference/core/callbacks/langchain_core.callbacks.streaming_stdout.StreamingStdOutCallbackHandler.html", "title": "Run models locally"}]-->
from langchain_community.llms import LlamaCpp
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler

llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/models/openorca-platypus2-13b.gguf.q4_0.bin",
    n_gpu_layers=1,
    n_batch=512,
    n_ctx=2048,
    f16_kv=True,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    verbose=True,
)

控制台日志将显示以下内容，以指示 Metal 已正确启用：

python 复制代码

ggml_metal_init: allocating
ggml_metal_init: using MPS

python 复制代码

llm.invoke("The first man on the moon was ... Let's think step by step")

python 复制代码

Llama.generate: prefix-match hit
``````output
 and use logical reasoning to figure out who the first man on the moon was.

Here are some clues:

1. The first man on the moon was an American.
2. He was part of the Apollo 11 mission.
3. He stepped out of the lunar module and became the first person to set foot on the moon's surface.
4. His last name is Armstrong.

Now, let's use our reasoning skills to figure out who the first man on the moon was. Based on clue #1, we know that the first man on the moon was an American. Clue #2 tells us that he was part of the Apollo 11 mission. Clue #3 reveals that he was the first person to set foot on the moon's surface. And finally, clue #4 gives us his last name: Armstrong.
Therefore, the first man on the moon was Neil Armstrong!
``````output

llama_print_timings:        load time =  9623.21 ms
llama_print_timings:      sample time =   143.77 ms /   203 runs   (    0.71 ms per token,  1412.01 tokens per second)
llama_print_timings: prompt eval time =   485.94 ms /     7 tokens (   69.42 ms per token,    14.40 tokens per second)
llama_print_timings:        eval time =  6385.16 ms /   202 runs   (   31.61 ms per token,    31.64 tokens per second)
llama_print_timings:       total time =  7279.28 ms

" and use logical reasoning to figure out who the first man on the moon was.\n\nHere are some clues:\n\n1. The first man on the moon was an American.\n2. He was part of the Apollo 11 mission.\n3. He stepped out of the lunar module and became the first person to set foot on the moon's surface.\n4. His last name is Armstrong.\n\nNow, let's use our reasoning skills to figure out who the first man on the moon was. Based on clue #1, we know that the first man on the moon was an American. Clue #2 tells us that he was part of the Apollo 11 mission. Clue #3 reveals that he was the first person to set foot on the moon's surface. And finally, clue #4 gives us his last name: Armstrong.\nTherefore, the first man on the moon was Neil Armstrong!"

GPT4All

我们可以使用从 GPT4All 模型浏览器下载的模型权重。

与上面所示类似，我们可以运行推理并使用 API 参考来设置感兴趣的参数。

python 复制代码

%pip install gpt4all

python 复制代码

<!--IMPORTS:[{"imported": "GPT4All", "source": "langchain_community.llms", "docs": "https://python.langchain.com/api_reference/community/llms/langchain_community.llms.gpt4all.GPT4All.html", "title": "Run models locally"}]-->
from langchain_community.llms import GPT4All

llm = GPT4All(
    model="/Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin"
)

llm.invoke("The first man on the moon was ... Let's think step by step")

".\n1) The United States decides to send a manned mission to the moon.2) They choose their best astronauts and train them for this specific mission.3) They build a spacecraft that can take humans to the moon, called the Lunar Module (LM).4) They also create a larger spacecraft, called the Saturn V rocket, which will launch both the LM and the Command Service Module (CSM), which will carry the astronauts into orbit.5) The mission is planned down to the smallest detail: from the trajectory of the rockets to the exact movements of the astronauts during their moon landing.6) On July 16, 1969, the Saturn V rocket launches from Kennedy Space Center in Florida, carrying the Apollo 11 mission crew into space.7) After one and a half orbits around the Earth, the LM separates from the CSM and begins its descent to the moon's surface.8) On July 20, 1969, at 2:56 pm EDT (GMT-4), Neil Armstrong becomes the first man on the moon. He speaks these"

llamafile

在本地运行 LLM 的最简单方法之一是使用 llamafile。您需要做的就是：

从 HuggingFace 下载一个 llamafile
使文件可执行
运行该文件

llamafiles 捆绑模型权重和一个特别编译的版本的 llama.cpp，将其合并为一个可以在大多数计算机上运行的单一文件，包含任何额外的依赖项。它们还配备了一个嵌入式推理服务器，提供与您的模型交互的 API。

这是一个简单的 bash 脚本，展示了所有 3 个设置步骤：

bash 复制代码

# Download a llamafile from HuggingFace
wget https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile

# Make the file executable. On Windows, instead just rename the file to end in ".exe".
chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile

# Start the model server. Listens at http://localhost:8080 by default.
./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser

在您运行上述设置步骤后，您可以使用 LangChain 与您的模型进行交互：

python 复制代码

from langchain_community.llms.llamafile import Llamafile

llm = Llamafile()

llm.invoke("The first man on the moon was ... Let's think step by step.")

输出示例：

复制代码

Firstly, let's imagine the scene where Neil Armstrong stepped onto the moon. This happened in 1969. The first man on the moon was Neil Armstrong. We already know that.
2nd, let's take a step back. Neil Armstrong didn't have any special powers. He had to land his spacecraft safely on the moon without injuring anyone or causing any damage. If he failed to do this, he would have been killed along with all those people who were on board the spacecraft.
3rd, let's imagine that Neil Armstrong successfully landed his spacecraft on the moon and made it back to Earth safely. The next step was for him to be hailed as a hero by his people back home. It took years before Neil Armstrong became an American hero.
4th, let's take another step back. Let's imagine that Neil Armstrong wasn't hailed as a hero, and instead, he was just forgotten. This happened in the 1970s. Neil Armstrong wasn't recognized for his remarkable achievement on the moon until after he died.
5th, let's take another step back. Let's imagine that Neil Armstrong didn't die in the 1970s and instead, lived to be a hundred years old. This happened in 2036. In the year 2036, Neil Armstrong would have been a centenarian.
Now, let's think about the present. Neil Armstrong is still alive. He turned 95 years old on July 20th, 2018. If he were to die now, his achievement of becoming the first human being to set foot on the moon would remain an unforgettable moment in history.
I hope this helps you understand the significance and importance of Neil Armstrong's achievement on the moon!

提示词

某些大型语言模型将受益于特定的提示词。

例如，LLaMA 将使用特殊标记。

我们可以使用 ConditionalPromptSelector 根据模型类型设置提示词。

python 复制代码

# Set our LLM
llm = LlamaCpp(
    model_path="/Users/rlm/Desktop/Code/llama.cpp/models/openorca-platypus2-13b.gguf.q4_0.bin",
    n_gpu_layers=1,
    n_batch=512,
    n_ctx=2048,
    f16_kv=True,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    verbose=True,
)

根据模型版本设置相关的提示词。

python 复制代码

from langchain.chains.prompt_selector import ConditionalPromptSelector
from langchain_core.prompts import PromptTemplate

DEFAULT_LLAMA_SEARCH_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""<<SYS>> \n You are an assistant tasked with improving Google search \
results. \n <</SYS>> \n\n [INST] Generate THREE Google search queries that \
are similar to this question. The output should be a numbered list of questions \
and each should have a question mark at the end: \n\n {question} [/INST]""",
)

DEFAULT_SEARCH_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an assistant tasked with improving Google search \
results. Generate THREE Google search queries that are similar to \
this question. The output should be a numbered list of questions and each \
should have a question mark at the end: {question}""",
)

QUESTION_PROMPT_SELECTOR = ConditionalPromptSelector(
    default_prompt=DEFAULT_SEARCH_PROMPT,
    conditionals=[(lambda llm: isinstance(llm, LlamaCpp), DEFAULT_LLAMA_SEARCH_PROMPT)],
)

prompt = QUESTION_PROMPT_SELECTOR.get_prompt(llm)
prompt

输出：

复制代码

PromptTemplate(input_variables=['question'], output_parser=None, partial_variables={}, template='<<SYS>> \n You are an assistant tasked with improving Google search results. \n <</SYS>> \n\n [INST] Generate THREE Google search queries that are similar to this question. The output should be a numbered list of questions and each should have a question mark at the end: \n\n {question} [/INST]', template_format='f-string', validate_template=True)

链式调用示例：

python 复制代码

# Chain
chain = prompt | llm
question = "What NFL team won the Super Bowl in the year that Justin Bieber was born?"
chain.invoke({"question": question})

输出示例：

复制代码

 Sure! Here are three similar search queries with a question mark at the end:

1. Which NBA team did LeBron James lead to a championship in the year he was drafted?
2. Who won the Grammy Awards for Best New Artist and Best Female Pop Vocal Performance in the same year that Lady Gaga was born?
3. What MLB team did Babe Ruth play for when he hit 60 home runs in a single season?

llama_print_timings:        load time = 14943.19 ms
llama_print_timings:      sample time =    72.93 ms /   101 runs   (    0.72 ms per token,  1384.87 tokens per second)
llama_print_timings: prompt eval time = 14942.95 ms /    93 tokens (  160.68 ms per token,     6.22 tokens per second)
llama_print_timings:        eval time =  3430.85 ms /   100 runs   (   34.31 ms per token,    29.15 tokens per second)
llama_print_timings:       total time = 18578.26 ms

'  Sure! Here are three similar search queries with a question mark at the end:\n\n1. Which NBA team did LeBron James lead to a championship in the year he was drafted?\n2. Who won the Grammy Awards for Best New Artist and Best Female Pop Vocal Performance in the same year that Lady Gaga was born?\n3. What MLB team did Babe Ruth play for when he hit 60 home runs in a single season?'

我们还可以使用 LangChain 提示中心来获取和/或存储特定于模型的提示。

这将与您的 LangSmith API 密钥一起使用。

例如，这里是一个针对 RAG 的提示，使用了 LLaMA 特定的标记。

用例

给定一个从上述模型之一创建的 LLM，您可以用于许多用例。

例如，这里是一个关于本地 RAG 的指南，适用于本地 LLMs。

一般来说，本地 LLMs 的用例可以由至少两个因素驱动：

隐私：用户不想分享的私人数据（例如，日记等）
成本：文本预处理（提取/标记）、摘要和代理模拟是使用令牌密集型的任务

此外，这里是关于微调的概述，可以利用开源大型语言模型（LLMs）。

推理优化工具对比

针对Windows和Linux用户，我详细整理了主流推理优化部署工具的特点，并用表格方式对比，帮你更清晰地选择。

工具名称	支持平台	主要优点	量化支持	GPU加速	部署复杂度	社区活跃度	适用场景
llama.cpp	Windows, Linux, macOS	轻量级，跨平台，支持多种量化方案	支持4-bit等多种	支持GPU（OpenCL/Metal macOS，Linux GPU支持有限）	中等，需编译配置	非常活跃	轻量级部署，研究开发，资源有限环境
gpt4all	Windows, Linux, macOS	丰富量化模型库，统一调用API	支持多种量化	CPU为主，有限GPU支持	简单（pip包+模型）	活跃	快速试验、教育研发、模型探索
llamafile	Windows, Linux	单文件打包（模型+推理引擎），极简部署	支持	CPU为主	非常简单（只需可执行文件）	较活跃	快速部署，无环境依赖，边缘设备
Exllama	Windows, Linux	针对NVIDIA GPU优化，支持高效并行推理	支持	强调NVIDIA GPU（CUDA）	复杂，需GPU驱动支持	小众	需要高性能推理，GPU专用
FastChat	Windows, Linux	开放对话系统框架，支持多模型	支持	依赖后端推理框架	复杂，需多组件配置	活跃	对话系统开发，服务化部署
Ollama	macOS专属	Apple Silicon优化，自动GPU加速	支持	Apple GPU	简单	活跃	macOS用户首选

1. llama.cpp

平台：支持 Windows、Linux、macOS
推理特点：
- 支持多种权重量化（4-bit、8-bit等），极大减少内存占用
- GPU加速支持有限，Linux上GPU支持主要依赖OpenCL或通过社区方案，Windows上GPU加速更少见
- 需要自行编译并配置，命令行工具简单
适合人群：
- 需要跨平台使用
- 轻量部署，不依赖大型框架
- 研究和开发测试

2. GPT4All

平台：跨平台
推理特点：
- 预打包多种量化模型
- 主要依赖CPU推理，GPU支持有限（未来可能增强）
- Python包管理，集成方便
适合人群：
- 快速实验和教学
- 多模型对比测试
- 对GPU依赖不高

3. llamafile

平台：Windows、Linux（无需依赖环境）
推理特点：
- 模型权重和推理引擎打包成单文件
- 包含本地HTTP推理服务器，方便与其它程序交互
- 部署极简，Windows上只需重命名为.exe即可
适合人群：
- 追求极简部署体验
- 资源环境受限的场景
- 快速搭建API服务

4. Exllama

平台：主要Windows和Linux
推理特点：
- 针对NVIDIA GPU深度优化，使用CUDA加速
- 支持高效的并行推理，适合大规模模型
- 需要安装NVIDIA驱动和CUDA环境
适合人群：
- 需要高性能推理的生产环境
- 有专用NVIDIA GPU硬件
- 具备一定技术能力维护环境

5. FastChat

平台：跨平台
推理特点：
- 专注于对话机器人系统架构
- 依赖多种推理后端（如llama.cpp、Exllama、GPT4All等）
- 适合复杂应用和服务部署
适合人群：
- 需要搭建对话系统服务
- 研发和生产环境

场景	推荐工具	理由
轻量跨平台部署	llama.cpp	社区资源丰富，支持Windows/Linux，多量化和基础GPU支持
快速无环境依赖部署	llamafile	单文件部署，极简运行，适合快速搭建本地API
多模型测试与开发	GPT4All	丰富模型库，简单API调用，支持多种硬件
高性能NVIDIA GPU推理	Exllama	针对CUDA优化，支持高吞吐量，适合大型模型和生产环境
对话机器人搭建	FastChat	完整对话框架，支持多种推理后端，适合构建复杂应用

在一行中初始化任何模型

许多大型语言模型应用程序允许最终用户指定他们希望应用程序使用的大模型供应商和模型。这需要编写一些逻辑，根据用户配置初始化不同的聊天模型。init_chat_model() 辅助方法使得初始化多种不同模型集成变得简单，而无需担心导入路径和类名。

基本用法

python 复制代码

from langchain.chat_models import init_chat_model

# 返回 langchain_openai.ChatOpenAI 实例
gpt_4o = init_chat_model("gpt-4o", model_provider="openai", temperature=0)

# 返回 langchain_anthropic.ChatAnthropic 实例
claude_opus = init_chat_model(
    "claude-3-opus-20240229", model_provider="anthropic", temperature=0
)

# 返回 langchain_google_vertexai.ChatVertexAI 实例
gemini_15 = init_chat_model(
    "gemini-1.5-pro", model_provider="google_vertexai", temperature=0
)

# 由于所有模型集成都实现了 ChatModel 接口，可统一使用
print("GPT-4o: " + gpt_4o.invoke("what's your name").content + "\n")
print("Claude Opus: " + claude_opus.invoke("what's your name").content + "\n")
print("Gemini 1.5: " + gemini_15.invoke("what's your name").content + "\n")

示例输出：

复制代码

GPT-4o: I'm an AI created by OpenAI, and I don't have a personal name. How can I assist you today?
Claude Opus: My name is Claude. It's nice to meet you!
Gemini 1.5: I am a large language model, trained by Google. 
I don't have a name like a person does. You can call me Bard if you like! 😊

推断大模型供应商

对于常见且不同的模型名称，init_chat_model() 会自动推断大模型供应商。

例如，任何以 gpt-3... 或 gpt-4... 开头的模型默认使用 openai。

python 复制代码

gpt_4o = init_chat_model("gpt-4o", temperature=0)
claude_opus = init_chat_model("claude-3-opus-20240229", temperature=0)
gemini_15 = init_chat_model("gemini-1.5-pro", temperature=0)

创建可配置模型

您也可以通过指定 configurable_fields 来创建一个运行时可配置的模型。

python 复制代码

configurable_model = init_chat_model(temperature=0)

response = configurable_model.invoke(
    "what's your name", config={"configurable": {"model": "gpt-4o"}}
)
print(response)

示例返回：

复制代码

AIMessage(content="I'm an AI created by OpenAI, and I don't have a personal name. How can I assist you today?", ...)

您可以切换不同模型：

python 复制代码

response = configurable_model.invoke(
    "what's your name", config={"configurable": {"model": "claude-3-5-sonnet-20240620"}}
)
print(response)

python 复制代码

AIMessage(content="My name is Claude. It's nice to meet you!", additional_kwargs={}, response_metadata={'id': 'msg_01Fx9P74A7syoFkwE73CdMMY', 'model': 'claude-3-5-sonnet-20240620', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 11, 'output_tokens': 15}}, id='run-a0fd2bbd-3b7e-46bf-8d69-a48c7e60b03c-0', usage_metadata={'input_tokens': 11, 'output_tokens': 15, 'total_tokens': 26})

带有默认值的可配置模型

可以创建带默认模型值的可配置模型，指定哪些参数可配置，并为可配置参数添加前缀：

python 复制代码

first_llm = init_chat_model(
    model="gpt-4o",
    temperature=0,
    configurable_fields=("model", "model_provider", "temperature", "max_tokens"),
    config_prefix="first",  # 适用于包含多个模型的链条
)

first_llm.invoke("what's your name")

python 复制代码

AIMessage(content="I'm an AI created by OpenAI, and I don't have a personal name. How can I assist you today?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 23, 'prompt_tokens': 11, 'total_tokens': 34}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_25624ae3a5', 'finish_reason': 'stop', 'logprobs': None}, id='run-3380f977-4b89-4f44-bc02-b64043b3166f-0', usage_metadata={'input_tokens': 11, 'output_tokens': 23, 'total_tokens': 34})

复制代码

first_llm.invoke(
    "what's your name",
    config={
        "configurable": {
            "first_model": "claude-3-5-sonnet-20240620",
            "first_temperature": 0.5,
            "first_max_tokens": 100,
        }
    },
)

python 复制代码

AIMessage(content="My name is Claude. It's nice to meet you!", additional_kwargs={}, response_metadata={'id': 'msg_01EFKSWpmsn2PSYPQa4cNHWb', 'model': 'claude-3-5-sonnet-20240620', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 11, 'output_tokens': 15}}, id='run-3c58f47c-41b9-4e56-92e7-fb9602e3787c-0', usage_metadata={'input_tokens': 11, 'output_tokens': 15, 'total_tokens': 26})

以声明方式使用可配置模型

我们可以在可配置模型上调用声明性操作，如 bind_tools、with_structured_output、with_configurable 等，并以与常规实例化的聊天模型对象相同的方式链接可配置模型。

python 复制代码

from pydantic import BaseModel, Field


class GetWeather(BaseModel):
    """Get the current weather in a given location"""

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


class GetPopulation(BaseModel):
    """Get the current population in a given location"""

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


llm = init_chat_model(temperature=0)
llm_with_tools = llm.bind_tools([GetWeather, GetPopulation])

llm_with_tools.invoke(
    "what's bigger in 2024 LA or NYC", config={"configurable": {"model": "gpt-4o"}}
).tool_calls

python 复制代码

[{'name': 'GetPopulation',
  'args': {'location': 'Los Angeles, CA'},
  'id': 'call_Ga9m8FAArIyEjItHmztPYA22',
  'type': 'tool_call'},
 {'name': 'GetPopulation',
  'args': {'location': 'New York, NY'},
  'id': 'call_jh2dEvBaAHRaw5JUDthOs7rt',
  'type': 'tool_call'}]

python 复制代码

llm_with_tools.invoke(
    "what's bigger in 2024 LA or NYC",
    config={"configurable": {"model": "claude-3-5-sonnet-20240620"}},
).tool_calls

python 复制代码

[{'name': 'GetPopulation',
  'args': {'location': 'Los Angeles, CA'},
  'id': 'toolu_01JMufPf4F4t2zLj7miFeqXp',
  'type': 'tool_call'},
 {'name': 'GetPopulation',
  'args': {'location': 'New York City, NY'},
  'id': 'toolu_01RQBHcE8kEEbYTuuS8WqY1u',
  'type': 'tool_call'}]

多模态

将多模态数据直接传递给模型

在这里，我们演示如何将多模态输入直接传递给模型。我们目前期望所有输入都以与OpenAI期望的格式相同的格式传递。对于其他支持多模态输入的大模型供应商，我们在类内部添加了逻辑以转换为预期格式。

在这个例子中，我们将要求模型描述一张图片。

python 复制代码

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

<!--IMPORTS:[{"imported": "HumanMessage", "source": "langchain_core.messages", "docs": "https://python.langchain.com/api_reference/core/messages/langchain_core.messages.human.HumanMessage.html", "title": "How to pass multimodal data directly to models"}, {"imported": "ChatOpenAI", "source": "langchain_openai", "docs": "https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html", "title": "How to pass multimodal data directly to models"}]-->
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o")

传递图像的最常见方式是将其作为字节字符串传递。这应该适用于大多数模型集成。

python 复制代码

import base64

import httpx

image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")

message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
        },
    ],
)
response = model.invoke([message])
print(response.content)

python 复制代码

The weather in the image appears to be clear and pleasant. The sky is mostly blue with scattered, light clouds, suggesting a sunny day with minimal cloud cover. There is no indication of rain or strong winds, and the overall scene looks bright and calm. The lush green grass and clear visibility further indicate good weather conditions.

我们可以直接在类型为 "image_url" 的内容块中传递图像 URL。请注意，只有一些大模型供应商支持此功能。

python 复制代码

message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {"type": "image_url", "image_url": {"url": image_url}},
    ],
)
response = model.invoke([message])
print(response.content)

python 复制代码

The weather in the image appears to be clear and sunny. The sky is mostly blue with a few scattered clouds, suggesting good visibility and a likely pleasant temperature. The bright sunlight is casting distinct shadows on the grass and vegetation, indicating it is likely daytime, possibly late morning or early afternoon. The overall ambiance suggests a warm and inviting day, suitable for outdoor activities.

我们还可以传递多个图像。

python 复制代码

message = HumanMessage(
    content=[
        {"type": "text", "text": "are these two images the same?"},
        {"type": "image_url", "image_url": {"url": image_url}},
        {"type": "image_url", "image_url": {"url": image_url}},
    ],
)
response = model.invoke([message])
print(response.content)

Yes, the two images are the same. They both depict a wooden boardwalk extending through a grassy field under a blue sky with light clouds. The scenery, lighting, and composition are identical.

工具调用

一些多模态模型也支持工具调用功能。要使用此类模型调用工具，只需以通常的方式将工具绑定到它们，并使用所需类型的内容块（例如，包含图像数据）调用模型。

python 复制代码

<!--IMPORTS:[{"imported": "tool", "source": "langchain_core.tools", "docs": "https://python.langchain.com/api_reference/core/tools/langchain_core.tools.convert.tool.html", "title": "How to pass multimodal data directly to models"}]-->
from typing import Literal

from langchain_core.tools import tool


@tool
def weather_tool(weather: Literal["sunny", "cloudy", "rainy"]) -> None:
    """Describe the weather"""
    pass


model_with_tools = model.bind_tools([weather_tool])

message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {"type": "image_url", "image_url": {"url": image_url}},
    ],
)
response = model_with_tools.invoke([message])
print(response.tool_calls)

[{'name': 'weather_tool', 'args': {'weather': 'sunny'}, 'id': 'call_BSX4oq4SKnLlp2WlzDhToHBr'}]

使用多模态提示

在这里我们演示如何使用提示词模板来格式化模型的 多模态输入。

我们要求模型描述一张图片：

python 复制代码

import base64
import httpx

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")

导入依赖：

python 复制代码

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o")

构造提示模板：

python 复制代码

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "Describe the image provided"),
        (
            "user",
            [
                {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,{image_data}"},
                }
            ],
        ),
    ]
)

chain = prompt | model
response = chain.invoke({"image_data": image_data})
print(response.content)

输出示例：

复制代码

The image depicts a sunny day with a beautiful blue sky filled with scattered white clouds. The sky has varying shades of blue... [省略]

我们也可以传入 多张图片 并进行比较：

python 复制代码

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "Compare the two pictures provided"),
        (
            "user",
            [
                {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,{image_data1}"},
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,{image_data2}"},
                },
            ],
        ),
    ]
)

chain = prompt | model

response = chain.invoke({
    "image_data1": image_data,
    "image_data2": image_data  # 这里为了演示，使用了同一张图
})
print(response.content)

输出示例：

复制代码

The two images provided are identical. Both images feature a wooden boardwalk path extending through a lush green field...

✅ 提示词结构说明

角色	内容类型	内容
system	文本	提示模型要做的任务（如：Describe / Compare）
user	图像	使用 `{ "type": "image_url", "image_url": {...} }` 格式传递图片

提示：确保你使用的模型是支持图像输入的多模态模型，如 gpt-4o。

回调

LangChain 提供了一个 回调系统 ，允许你在 LLM 应用程序的各个阶段挂钩事件，用于日志记录、监控、流式处理等任务。

你可以通过 API 中的 callbacks 参数订阅事件，该参数是一个 处理器对象列表，预计实现下面更详细描述的一个或多个方法。

事件	触发时机	对应方法
聊天模型开始	当聊天模型开始调用时	`on_chat_model_start`
LLM 开始	当 LLM 开始调用时	`on_llm_start`
LLM 新令牌	LLM 或聊天模型输出新 token 时	`on_llm_new_token`
LLM 结束	模型返回结果时	`on_llm_end`
LLM 错误	模型执行出错时	`on_llm_error`
链开始	一个 Chain 执行开始时	`on_chain_start`
链结束	一个 Chain 执行完毕时	`on_chain_end`
链错误	Chain 执行出错时	`on_chain_error`
工具开始	工具开始执行时	`on_tool_start`
工具结束	工具执行完毕时	`on_tool_end`
工具错误	工具执行出错时	`on_tool_error`
代理动作	Agent 执行动作时	`on_agent_action`
代理结束	Agent 执行完成时	`on_agent_finish`
检索器开始	检索器调用开始	`on_retriever_start`
检索器结束	检索器调用完毕	`on_retriever_end`
检索器错误	检索器发生错误	`on_retriever_error`
文本	任意文本产生	`on_text`
重试	可重试组件触发重试逻辑时	`on_retry`

回调处理器可以是同步或异步：

类型	实现接口	用于
同步	`BaseCallbackHandler`	默认同步执行回调
异步	`AsyncCallbackHandler`	支持 async 回调

LangChain 会自动配置 CallbackManager 或 AsyncCallbackManager 来分发事件。

在API的绝大多数对象（模型、工具、代理等）中，callbacks属性在两个不同的地方可用：

位置	描述
构造函数	在初始化时传递：`chain = SomeChain(callbacks=[handler])`
请求时间	在 `.invoke()` 或 `.batch()` 时传入：`invoke(..., callbacks=[...])`

⚠️ 注意：

构造函数设置的回调不会被子对象继承；请求时间传入的回调则会自动传播给所有子组件。

当你创建自定义 Runnable 对象或 Tool 时，记得手动传播回调：

python 复制代码

class MyRunnable(Runnable):
    def invoke(self, input, config):
        # 传入子对象时继续传递 callbacks
        return self.child.invoke(input, config=config)

在运行时传递回调

在许多情况下，运行对象时传递处理器更为有利。当我们在执行运行时通过 callbacks 关键字参数传递 CallbackHandlers，这些回调将由所有参与执行的嵌套对象发出。例如，当一个处理器传递给代理时，它将用于与代理相关的所有回调以及参与代理执行的所有对象，在这种情况下，包括工具和大型语言模型。

这避免了我们必须手动将处理器附加到每个单独的嵌套对象。以下是一个示例：

python 复制代码

from typing import Any, Dict, List

from langchain_anthropic import ChatAnthropic
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.messages import BaseMessage
from langchain_core.outputs import LLMResult
from langchain_core.prompts import ChatPromptTemplate


class LoggingHandler(BaseCallbackHandler):
    def on_chat_model_start(
        self, serialized: Dict[str, Any], messages: List[List[BaseMessage]], **kwargs
    ) -> None:
        print("Chat model started")

    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        print(f"Chat model ended, response: {response}")

    def on_chain_start(
        self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs
    ) -> None:
        print(f"Chain {serialized.get('name')} started")

    def on_chain_end(self, outputs: Dict[str, Any], **kwargs) -> None:
        print(f"Chain ended, outputs: {outputs}")


callbacks = [LoggingHandler()]
llm = ChatAnthropic(model="claude-3-sonnet-20240229")
prompt = ChatPromptTemplate.from_template("What is 1 + {number}?")

chain = prompt | llm

chain.invoke({"number": "2"}, config={"callbacks": callbacks})

python 复制代码

Chain RunnableSequence started
Chain ChatPromptTemplate started
Chain ended, outputs: messages=[HumanMessage(content='What is 1 + 2?')]
Chat model started
Chat model ended, response: generations=[[ChatGeneration(text='1 + 2 = 3', message=AIMessage(content='1 + 2 = 3', response_metadata={'id': 'msg_01D8Tt5FdtBk5gLTfBPm2tac', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 16, 'output_tokens': 13}}, id='run-bb0dddd8-85f3-4e6b-8553-eaa79f859ef8-0'))]] llm_output={'id': 'msg_01D8Tt5FdtBk5gLTfBPm2tac', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 16, 'output_tokens': 13}} run=None
Chain ended, outputs: content='1 + 2 = 3' response_metadata={'id': 'msg_01D8Tt5FdtBk5gLTfBPm2tac', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 16, 'output_tokens': 13}} id='run-bb0dddd8-85f3-4e6b-8553-eaa79f859ef8-0'

python 复制代码

AIMessage(content='1 + 2 = 3', response_metadata={'id': 'msg_01D8Tt5FdtBk5gLTfBPm2tac', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 16, 'output_tokens': 13}}, id='run-bb0dddd8-85f3-4e6b-8553-eaa79f859ef8-0')

如果模块已经存在回调，这些回调将在运行时传入的回调之外执行。

将回调附加到可运行对象

如果您正在组合一系列可运行对象并希望在多次执行中重用回调，可以使用 .with_config() 方法附加回调。这可以节省您每次调用链时传递回调的需要。

with_config() 绑定一个配置，该配置将被解释为运行时配置。因此，这些回调将传播到所有子组件。

python 复制代码

from typing import Any, Dict, List

from langchain_anthropic import ChatAnthropic
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.messages import BaseMessage
from langchain_core.outputs import LLMResult
from langchain_core.prompts import ChatPromptTemplate


class LoggingHandler(BaseCallbackHandler):
    def on_chat_model_start(
        self, serialized: Dict[str, Any], messages: List[List[BaseMessage]], **kwargs
    ) -> None:
        print("Chat model started")

    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        print(f"Chat model ended, response: {response}")

    def on_chain_start(
        self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs
    ) -> None:
        print(f"Chain {serialized.get('name')} started")

    def on_chain_end(self, outputs: Dict[str, Any], **kwargs) -> None:
        print(f"Chain ended, outputs: {outputs}")


callbacks = [LoggingHandler()]
llm = ChatAnthropic(model="claude-3-sonnet-20240229")
prompt = ChatPromptTemplate.from_template("What is 1 + {number}?")

chain = prompt | llm

chain_with_callbacks = chain.with_config(callbacks=callbacks)

chain_with_callbacks.invoke({"number": "2"})

python 复制代码

Chain RunnableSequence started
Chain ChatPromptTemplate started
Chain ended, outputs: messages=[HumanMessage(content='What is 1 + 2?')]
Chat model started
Chat model ended, response: generations=[[ChatGeneration(text='1 + 2 = 3', message=AIMessage(content='1 + 2 = 3', response_metadata={'id': 'msg_01NTYMsH9YxkoWsiPYs4Lemn', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 16, 'output_tokens': 13}}, id='run-d6bcfd72-9c94-466d-bac0-f39e456ad6e3-0'))]] llm_output={'id': 'msg_01NTYMsH9YxkoWsiPYs4Lemn', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 16, 'output_tokens': 13}} run=None
Chain ended, outputs: content='1 + 2 = 3' response_metadata={'id': 'msg_01NTYMsH9YxkoWsiPYs4Lemn', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 16, 'output_tokens': 13}} id='run-d6bcfd72-9c94-466d-bac0-f39e456ad6e3-0'

python 复制代码

AIMessage(content='1 + 2 = 3', response_metadata={'id': 'msg_01NTYMsH9YxkoWsiPYs4Lemn', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 16, 'output_tokens': 13}}, id='run-d6bcfd72-9c94-466d-bac0-f39e456ad6e3-0')

绑定的回调将在所有嵌套模块运行时执行。

这两种回调传递方式，分别是：

✅ 运行时传递回调
🧷 使用 .with_config() 绑定回调

它们的核心功能相似（都可以让你监听执行过程中的事件），但用途场景、生命周期管理上略有不同：

特性	运行时传递 callbacks	`.with_config(callbacks=...)` 绑定回调
传递方式	`invoke(..., config={"callbacks": [...]})`	`runnable.with_config(callbacks=[...])`
适用场景	每次调用时设置不同回调	多次复用链或模型，统一绑定回调
继承传播	自动传播到所有子组件	自动传播到所有子组件
执行优先级	高于构造时绑定的回调	类似运行时，但不能覆盖 `invoke` 时的设置
重用性	每次都需手动传入	可定义一次，多次调用
对象修改性	不会改变原始 chain/model 对象	生成一个带配置的新对象
推荐用途	临时调试、动态行为变化	统一日志、监控、生产环境常驻设置

✅ 使用场景建议

使用目标	推荐方式	理由
想动态改变监听器	`config={"callbacks":...}`	更灵活，适合测试或对每次调用做不同处理
想给某个链统一设置日志记录	`.with_config(callbacks)`	可复用，代码简洁，适合生产部署
只在顶层定义回调	任意方式均可	都会向下传播到子组件
子组件需要完全不同的回调	多个 `.with_config()`	每个链或子模型各自绑定不同回调

📌 重要说明

如果你同时设置了 .with_config() 和 invoke(..., config=...)，运行时的回调优先生效。
.with_config() 返回的是一个新的 绑定回调的对象，原对象不变。
所有回调（无论来源）都会被嵌套子组件（如 Chain 内的 Prompt、LLM、Tool 等）自动继承和触发，无需你手动传播。

✅ 小结

如果你只想临时观察某次运行（比如调试），用 config={"callbacks":...}。
如果你希望为某个链长期启用日志记录等回调，建议使用 .with_config(callbacks=...)。

而，

"构造时绑定的回调 "是指你在创建某个对象的时候 ，通过其构造函数（如 __init__）直接传入 callbacks 参数，把回调绑定到该对象上。

这个绑定方式 只对该对象本身生效 ，不会自动传播给其子组件 ，这是它和 .with_config() 或运行时 config={"callbacks":...} 的最大区别。

示例：构造时绑定回调

python 复制代码

from langchain_anthropic import ChatAnthropic
from langchain_core.callbacks import BaseCallbackHandler

class MyHandler(BaseCallbackHandler):
    def on_llm_start(self, *args, **kwargs):
        print("LLM is starting...")

# 构造时绑定回调
llm = ChatAnthropic(model="claude-3-sonnet-20240229", callbacks=[MyHandler()])

回调在 模型对象 llm 本身上生效。
如果你把这个 llm 放进一个 Chain 或 agent 里，它的回调不会自动向上传播。
除非你手动给上层 Chain/Agent 再加一遍回调。

🆚 与其他方式对比

绑定方式	是否向子组件传播	是否推荐	说明
构造时绑定	❌ 不会	❌ 较少使用	只能在对象本身有效，容易遗漏子对象
`.with_config()`	✅ 会	✅ 推荐	自动传播，复用性强
`invoke(config=...)`	✅ 会	✅ 推荐	动态灵活，适合测试/临时覆盖设置

✅ 小结

"构造时绑定的回调"：

是指你在 构造模型/链/工具对象时 用 callbacks=[...] 传入的方式；
只影响当前对象 ，不会传播；
不推荐单独使用 ，更推荐用 .with_config() 或运行时 config={"callbacks":...}。

如果你只绑定在构造函数上，却没有其他传播方式，可能就 不会触发子对象的回调事件，这是很多人调试时遇到的坑。

传播回调构造函数

大多数 LangChain 模块允许您直接将 callbacks 传递到构造函数（即初始化器）中。在这种情况下，回调仅会被该实例（及任何嵌套运行）调用。

python 复制代码

from typing import Any, Dict, List

from langchain_anthropic import ChatAnthropic
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.messages import BaseMessage
from langchain_core.outputs import LLMResult
from langchain_core.prompts import ChatPromptTemplate


class LoggingHandler(BaseCallbackHandler):
    def on_chat_model_start(
        self, serialized: Dict[str, Any], messages: List[List[BaseMessage]], **kwargs
    ) -> None:
        print("Chat model started")

    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        print(f"Chat model ended, response: {response}")

    def on_chain_start(
        self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs
    ) -> None:
        print(f"Chain {serialized.get('name')} started")

    def on_chain_end(self, outputs: Dict[str, Any], **kwargs) -> None:
        print(f"Chain ended, outputs: {outputs}")


callbacks = [LoggingHandler()]
llm = ChatAnthropic(model="claude-3-sonnet-20240229", callbacks=callbacks)
prompt = ChatPromptTemplate.from_template("What is 1 + {number}?")

chain = prompt | llm

chain.invoke({"number": "2"})

python 复制代码

Chat model started
Chat model ended, response: generations=[[ChatGeneration(text='1 + 2 = 3', message=AIMessage(content='1 + 2 = 3', response_metadata={'id': 'msg_01CdKsRmeS9WRb8BWnHDEHm7', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 16, 'output_tokens': 13}}, id='run-2d7fdf2a-7405-4e17-97c0-67e6b2a65305-0'))]] llm_output={'id': 'msg_01CdKsRmeS9WRb8BWnHDEHm7', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 16, 'output_tokens': 13}} run=None

python 复制代码

AIMessage(content='1 + 2 = 3', response_metadata={'id': 'msg_01CdKsRmeS9WRb8BWnHDEHm7', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 16, 'output_tokens': 13}}, id='run-2d7fdf2a-7405-4e17-97c0-67e6b2a65305-0')

创建自定义回调处理器

LangChain 有一些内置的回调处理器，但您通常会希望创建自己的处理器以实现自定义逻辑。

要创建自定义回调处理器，我们需要确定我们希望回调处理器处理的事件以及当事件被触发时我们希望回调处理器执行的操作。然后我们只需将回调处理器附加到对象上，例如通过构造函数或在运行时。

在下面的示例中，我们将实现一个带有自定义处理器的流式处理。

在我们的自定义回调处理器 MyCustomHandler 中，我们实现了 on_llm_new_token 处理器，以打印我们刚刚收到的令牌。然后我们将自定义处理器作为构造函数回调附加到模型对象上。

python 复制代码

from langchain_anthropic import ChatAnthropic
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate


class MyCustomHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"My custom handler, token: {token}")


prompt = ChatPromptTemplate.from_messages(["Tell me a joke about {animal}"])

# To enable streaming, we pass in `streaming=True` to the ChatModel constructor
# Additionally, we pass in our custom handler as a list to the callbacks parameter
model = ChatAnthropic(
    model="claude-3-sonnet-20240229", streaming=True, callbacks=[MyCustomHandler()]
)

chain = prompt | model

response = chain.invoke({"animal": "bears"})

python 复制代码

My custom handler, token: Here
My custom handler, token: 's
My custom handler, token:  a
My custom handler, token:  bear
My custom handler, token:  joke
My custom handler, token:  for
My custom handler, token:  you
My custom handler, token: :
My custom handler, token: 

Why
My custom handler, token:  di
My custom handler, token: d the
My custom handler, token:  bear
My custom handler, token:  dissol
My custom handler, token: ve
My custom handler, token:  in
My custom handler, token:  water
My custom handler, token: ?
My custom handler, token: 
Because
My custom handler, token:  it
My custom handler, token:  was
My custom handler, token:  a
My custom handler, token:  polar
My custom handler, token:  bear
My custom handler, token: !

在异步环境中使用回调

如果您计划使用异步 API，建议使用并扩展 AsyncCallbackHandler 以避免阻塞事件。

如果您在使用异步方法运行您的大型语言模型 / 链 / 工具 / 代理时使用同步 CallbackHandler，它仍然会工作。然而，在底层，它将通过 run_in_executor 被调用，如果您的 CallbackHandler 不是线程安全的，可能会导致问题。

python 复制代码

import asyncio
from typing import Any, Dict, List

from langchain_anthropic import ChatAnthropic
from langchain_core.callbacks import AsyncCallbackHandler, BaseCallbackHandler
from langchain_core.messages import HumanMessage
from langchain_core.outputs import LLMResult


class MyCustomSyncHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"Sync handler being called in a `thread_pool_executor`: token: {token}")


class MyCustomAsyncHandler(AsyncCallbackHandler):
    """Async callback handler that can be used to handle callbacks from langchain."""

    async def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> None:
        """Run when chain starts running."""
        print("zzzz....")
        await asyncio.sleep(0.3)
        class_name = serialized["name"]
        print("Hi! I just woke up. Your llm is starting")

    async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
        """Run when chain ends running."""
        print("zzzz....")
        await asyncio.sleep(0.3)
        print("Hi! I just woke up. Your llm is ending")


# To enable streaming, we pass in `streaming=True` to the ChatModel constructor
# Additionally, we pass in a list with our custom handler
chat = ChatAnthropic(
    model="claude-3-sonnet-20240229",
    max_tokens=25,
    streaming=True,
    callbacks=[MyCustomSyncHandler(), MyCustomAsyncHandler()],
)

await chat.agenerate([[HumanMessage(content="Tell me a joke")]])

调度自定义回调事件

在某些情况下，您可能希望从运行接口内部 调度自定义回调事件，以便它可以：

在自定义回调处理程序中显示
通过 astream 事件 API 捕获
提供给终端用户查看任务进度

要调度自定义事件，您需要为事件决定两个属性：name 和 data。

属性	类型	描述
name	str	用户自定义事件名称
data	Any	与事件相关的数据，建议为 JSON 可序列化格式

注意： 需要 langchain-core>=0.2.15 才支持自定义事件调度。

自定义事件只能在现有的 Runnable 对象内部派发
使用 astream_events 时必须指定 version='v2'
当前 LangSmith 不支持发送或渲染 自定义事件

Python 版本兼容性

Python 版本	传播配置	说明
`>=3.11`	✅ 自动	`RunnableConfig` 自动传播到异步子对象
`<=3.10`	❌ 不自动	需手动传播 `RunnableConfig` 到子 `Runnable` 对象

使用 `astream_events()` 观察事件

python 复制代码

from langchain_core.callbacks.manager import adispatch_custom_event
from langchain_core.runnables import RunnableLambda

@RunnableLambda
async def foo(x: str) -> str:
    await adispatch_custom_event("event1", {"x": x})
    await adispatch_custom_event("event2", 5)
    return x

async for event in foo.astream_events("hello world", version="v2"):
    print(event)

输出示例：

json 复制代码

{'event': 'on_custom_event', 'name': 'event1', 'data': {'x': 'hello world'}, ...}
{'event': 'on_custom_event', 'name': 'event2', 'data': 5, ...}

异步回调处理器示例

python 复制代码

from langchain_core.callbacks import AsyncCallbackHandler

class AsyncCustomCallbackHandler(AsyncCallbackHandler):
    async def on_custom_event(self, name, data, *, run_id, tags=None, metadata=None, **kwargs):
        print(f"Received event {name} with data: {data}")

# 注册回调
async_handler = AsyncCustomCallbackHandler()
await foo.ainvoke(1, {"callbacks": [async_handler], "tags": ["foo", "bar"]})

同步回调处理器示例

python 复制代码

from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.callbacks.manager import dispatch_custom_event

class CustomHandler(BaseCallbackHandler):
    def on_custom_event(self, name, data, *, run_id, tags=None, metadata=None, **kwargs):
        print(f"Received event {name} with data: {data}")

@RunnableLambda
def foo(x: int, config: RunnableConfig) -> int:
    dispatch_custom_event("event1", {"x": x})
    dispatch_custom_event("event2", {"x": x})
    return x

handler = CustomHandler()
foo.invoke(1, {"callbacks": [handler], "tags": ["foo", "bar"]})

场景	方法	是否需要手动传播 config
异步 + Python ≥ 3.11	`adispatch_custom_event`	否
异步 + Python ≤ 3.10	`adispatch_custom_event`	✅ 需要
同步	`dispatch_custom_event`	✅ 需要
消费事件（异步）	`AsyncCallbackHandler`	否
消费事件（同步）	`BaseCallbackHandler`	否

索引

索引是保持您的向量存储与底层数据源同步的过程。

使用 LangChain 索引 API

索引 API 允许您从任意来源 加载文档，并与向量存储同步。其核心目标是提升处理效率、降低成本，并提升向量搜索的准确性。

索引 API 在以下方面帮助您节省时间与资源：

✅ 避免重复写入 向量存储
✅ 避免重写未变更内容
✅ 避免对未变更内容重复计算嵌入向量

即使文档在存储前经过多个转换步骤（如 文本分块 ），索引 API 依然能正确识别并处理源文档。

这意味着您可以：

进行嵌套处理（如清洗 → 分块 → 嵌入）
仍然保持与原始数据的一致性和去重

LangChain 的索引 API 使用 RecordManager 实现内容去重与状态追踪。

每当您向向量存储写入文档时，系统会执行以下操作：

计算文档哈希 ：基于 page_content + metadata 生成唯一哈希
记录写入时间
记录源 ID ：元数据中必须包含唯一标识符（如 "source" 字段）

删除模式

在将文档索引到向量存储时，删除旧文档 是常见需求，尤其当：

要替换相同来源的旧文档时
要彻底更新所有内容时

索引 API 提供三种删除模式供您选择合适的策略。

清理模式	去重内容	可并行化	清理已删除源文档	清理内容变更	清理时间
无（none）	✅	✅	❌	❌	-
增量	✅	✅	❌	✅	持续进行
完整	✅	❌	✅	✅	索引结束时

🔹 无（none）

不执行任何自动清理
需由用户手动管理旧内容
适合用于精确控制删除逻辑的情况

🔹 增量（incremental）

持续检查变更并删除旧版本
不会删除已经从源中移除的文档
更适合实时/长运行任务，能最小化"新旧版本同时存在"的窗口期

🔹 完整（full）

写入完成后统一清理
删除所有已变更或已被源删除的文档
不支持并行写入，适合批处理型任务

场景	增量模式	完整模式
文档内容变更（如 PDF 被修改）	✅	✅
文档从源中被移除（已删除）	❌	✅

⚠️ 注意事项

增量/完整模式不可与预填充的向量存储混用 ，否则 RecordManager 无法追踪状态。
所使用的 向量存储必须满足以下条件：
- 支持 add_documents(ids=...) 添加文档
- 支持 delete(ids=...) 删除文档

✅ 支持的向量存储（部分示例）

复制代码

FAISS, Chroma, Weaviate, Pinecone, Qdrant, Redis, Milvus, 
PGVector, ElasticsearchStore, MongoDBAtlasVectorSearch, Supabase, 
OpenSearchVectorSearch, VespaStore, Vald, TencentVectorDB 等等

⏱ 时间相关清理限制

使用增量或完整清理模式时，LangChain 基于时间戳进行删除判断。需要注意：

如果两个任务执行时间非常接近（如毫秒级），可能会导致清理失效
实际中这种情况较少见，因为：
- LangChain 使用高分辨率时间戳
- 数据变动频率通常较低
- 索引任务一般耗时较长

快速入门

python 复制代码

<!--IMPORTS:[{"imported": "SQLRecordManager", "source": "langchain.indexes", "docs": "https://python.langchain.com/api_reference/langchain/indexes/langchain.indexes._sql_record_manager.SQLRecordManager.html", "title": "How to use the LangChain indexing API"}, {"imported": "index", "source": "langchain.indexes", "docs": "https://python.langchain.com/api_reference/core/indexing/langchain_core.indexing.api.index.html", "title": "How to use the LangChain indexing API"}, {"imported": "Document", "source": "langchain_core.documents", "docs": "https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html", "title": "How to use the LangChain indexing API"}, {"imported": "OpenAIEmbeddings", "source": "langchain_openai", "docs": "https://python.langchain.com/api_reference/openai/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html", "title": "How to use the LangChain indexing API"}]-->
from langchain.indexes import SQLRecordManager, index
from langchain_core.documents import Document
from langchain_elasticsearch import ElasticsearchStore
from langchain_openai import OpenAIEmbeddings

初始化向量存储并设置嵌入：

python 复制代码

collection_name = "test_index"

embedding = OpenAIEmbeddings()

vectorstore = ElasticsearchStore(
    es_url="http://localhost:9200", index_name="test_index", embedding=embedding
)

使用适当的命名空间初始化记录管理器。

建议：使用一个考虑到向量存储和向量存储中集合名称的命名空间；例如，'redis/my_docs'，'chromadb/my_docs' 或 'postgres/my_docs'。

python 复制代码

namespace = f"elasticsearch/{collection_name}"
record_manager = SQLRecordManager(
    namespace, db_url="sqlite:///record_manager_cache.sql"
)

在使用记录管理器之前创建一个模式。

python 复制代码

record_manager.create_schema()

让我们索引一些测试文档：

python 复制代码

doc1 = Document(page_content="kitty", metadata={"source": "kitty.txt"})
doc2 = Document(page_content="doggy", metadata={"source": "doggy.txt"})

索引到一个空的向量存储中：

python 复制代码

def _clear():
    """Hacky helper method to clear content. See the `full` mode section to to understand why it works."""
    index([], record_manager, vectorstore, cleanup="full", source_id_key="source")

None 删除模式

此模式不会自动清理旧版本的内容；但是，它仍然会处理内容去重。

python 复制代码

_clear()

index(
    [doc1, doc1, doc1, doc1, doc1],
    record_manager,
    vectorstore,
    cleanup=None,
    source_id_key="source",
)

python 复制代码

{'num_added': 1, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

python 复制代码

_clear()

index([doc1, doc2], record_manager, vectorstore, cleanup=None, source_id_key="source")

python 复制代码

{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

第二次所有内容将被跳过：

python 复制代码

index([doc1, doc2], record_manager, vectorstore, cleanup=None, source_id_key="source")

{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 0}

"incremental" 删除模式

python 复制代码

_clear()

index(
    [doc1, doc2],
    record_manager,
    vectorstore,
    cleanup="incremental",
    source_id_key="source",
)

{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

重新索引应该导致两个文档都被跳过 -- 也跳过嵌入操作！

python 复制代码

ndex(
    [doc1, doc2],
    record_manager,
    vectorstore,
    cleanup="incremental",
    source_id_key="source",
)

{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 0}

如果我们在增量索引模式下不提供任何文档，则不会发生任何变化。

python 复制代码

index([], record_manager, vectorstore, cleanup="incremental", source_id_key="source")

{'num_added': 0, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

如果我们修改了一个文档，新版本将被写入，所有共享相同源的旧版本将被删除。

python 复制代码

changed_doc_2 = Document(page_content="puppy", metadata={"source": "doggy.txt"})

index(
    [changed_doc_2],
    record_manager,
    vectorstore,
    cleanup="incremental",
    source_id_key="source",
)

{'num_added': 1, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 1}

"full" 删除模式

在 full 模式下，用户应该将应该被索引的所有内容传递给索引函数。

任何未传递到索引函数且存在于向量存储中的文档将被删除！

这种行为对于处理源文档的删除非常有用。

python 复制代码

_clear()

all_docs = [doc1, doc2]

index(all_docs, record_manager, vectorstore, cleanup="full", source_id_key="source")

{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

假设有人删除了第一个文档：

python 复制代码

del all_docs[0]

all_docs

[Document(page_content='doggy', metadata={'source': 'doggy.txt'})]

使用全模式将清理已删除的内容。

python 复制代码

index(all_docs, record_manager, vectorstore, cleanup="full", source_id_key="source")

{'num_added': 0, 'num_updated': 0, 'num_skipped': 1, 'num_deleted': 1}

源

在使用索引 API 进行文档管理时，每个文档的元数据中应包含 source 字段 ，用于标识该文档来自的最终来源。

✅ 推荐实践：

对于从同一个父文档切分 出的多个子文档，它们的 source 应该相同。
示例：多个分块来自同一个 PDF 或 TXT 文件，那么它们的 metadata['source'] 应指向该文件路径或唯一标识符。

❌ 避免使用：

仅在永远不会使用 incremental 模式 ，且确实无法提供 source 时，才可考虑设为 None。

示例：分块 + 使用索引 API

python 复制代码

from langchain_text_splitters import CharacterTextSplitter
from langchain_core.documents import Document

📄 定义文档

python 复制代码

doc1 = Document(page_content="kitty kitty kitty kitty kitty", metadata={"source": "kitty.txt"})
doc2 = Document(page_content="doggy doggy the doggy", metadata={"source": "doggy.txt"})

✂️ 分块处理

python 复制代码

new_docs = CharacterTextSplitter(
    separator="t", keep_separator=True, chunk_size=12, chunk_overlap=2
).split_documents([doc1, doc2])

📤 输出结果（部分）

python 复制代码

[
 Document(page_content='kitty kit', metadata={'source': 'kitty.txt'}),
 Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
 Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
 Document(page_content='doggy doggy', metadata={'source': 'doggy.txt'}),
 Document(page_content='the doggy', metadata={'source': 'doggy.txt'})
]

🧠 索引文档（首次）

python 复制代码

index(
    new_docs,
    record_manager,
    vectorstore,
    cleanup="incremental",  # 启用增量清理
    source_id_key="source",
)

返回结果：

python 复制代码

{'num_added': 5, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

🆕 更新源文档（如 doggy.txt 发生变化）

python 复制代码

changed_doggy_docs = [
    Document(page_content="woof woof", metadata={"source": "doggy.txt"}),
    Document(page_content="woof woof woof", metadata={"source": "doggy.txt"}),
]

重新索引：

python 复制代码

index(
    changed_doggy_docs,
    record_manager,
    vectorstore,
    cleanup="incremental",
    source_id_key="source",
)

返回结果：

python 复制代码

{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 2}

🔍 向量搜索验证

python 复制代码

vectorstore.similarity_search("dog", k=30)

返回结果（仅保留新的 doggy 内容）：

python 复制代码

[
 Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
 Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'}),
 Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
 Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
 Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})
]

与加载器一起使用索引 API

索引 API 支持传入 文档的可迭代对象 或任何符合接口的 文档加载器（Loader）。

⚠️ 注意：加载器必须正确设置 metadata["source"] 字段！

否则增量或完整模式下的清理逻辑将无法生效。

自定义加载器示例

python 复制代码

from langchain_core.document_loaders import BaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import CharacterTextSplitter

python 复制代码

class MyCustomLoader(BaseLoader):
    def lazy_load(self):
        text_splitter = CharacterTextSplitter(
            separator="t", keep_separator=True, chunk_size=12, chunk_overlap=2
        )
        docs = [
            Document(page_content="woof woof", metadata={"source": "doggy.txt"}),
            Document(page_content="woof woof woof", metadata={"source": "doggy.txt"}),
        ]
        yield from text_splitter.split_documents(docs)

    def load(self):
        return list(self.lazy_load())

✅ 使用加载器

python 复制代码

loader = MyCustomLoader()
loader.load()

输出结果：

python 复制代码

[
 Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
 Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'})
]

🔄 传入加载器进行索引

python 复制代码

index(loader, record_manager, vectorstore, cleanup="full", source_id_key="source")

返回结果：

python 复制代码

{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

🔍 向量搜索验证

python 复制代码

vectorstore.similarity_search("dog", k=30)

搜索结果：

python 复制代码

[
 Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
 Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'})
]

如需加载本地文件、网页、数据库等不同来源的文档，您可以继承 BaseLoader 并实现 load() 或 lazy_load() 方法，然后传给 index() 即可实现智能去重和增量更新。需要帮助实现其他加载器可继续提问。

序列化------保存和加载 LangChain 对象

LangChain 类实现了标准的序列化方法。使用这些方法序列化 LangChain 对象具有一些优势：

秘密信息，如 API 密钥，与其他参数分开，并可以在反序列化时重新加载到对象中；

反序列化在不同版本的包之间保持兼容，因此使用一个版本的 LangChain 序列化的对象可以在另一个版本中正确反序列化。

要使用此系统保存和加载 LangChain 对象，请使用 dumpd、dumps、load 和 loads 函数，这些函数位于 langchain-core 的 load 模块中。这些函数支持 JSON 和 JSON 可序列化对象。

所有继承自 Serializable 的 LangChain 对象都是 JSON 可序列化的。示例包括消息、文档对象（例如，从检索器返回的），以及大多数可运行对象，如聊天模型、检索器和使用 LangChain 表达式语言实现的链。

下面我们通过一个简单的 LLM 链示例进行讲解。

python 复制代码

<!--IMPORTS:[{"imported": "dumpd", "source": "langchain_core.load", "docs": "https://python.langchain.com/api_reference/core/load/langchain_core.load.dump.dumpd.html", "title": "How to save and load LangChain objects"}, {"imported": "dumps", "source": "langchain_core.load", "docs": "https://python.langchain.com/api_reference/core/load/langchain_core.load.dump.dumps.html", "title": "How to save and load LangChain objects"}, {"imported": "load", "source": "langchain_core.load", "docs": "https://python.langchain.com/api_reference/core/load/langchain_core.load.load.load.html", "title": "How to save and load LangChain objects"}, {"imported": "loads", "source": "langchain_core.load", "docs": "https://python.langchain.com/api_reference/core/load/langchain_core.load.load.loads.html", "title": "How to save and load LangChain objects"}, {"imported": "ChatPromptTemplate", "source": "langchain_core.prompts", "docs": "https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html", "title": "How to save and load LangChain objects"}, {"imported": "ChatOpenAI", "source": "langchain_openai", "docs": "https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html", "title": "How to save and load LangChain objects"}]-->
from langchain_core.load import dumpd, dumps, load, loads
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "Translate the following into {language}:"),
        ("user", "{text}"),
    ],
)

llm = ChatOpenAI(model="gpt-4o-mini", api_key="llm-api-key")

chain = prompt | llm

保存对象

转为json

python 复制代码

string_representation = dumps(chain, pretty=True)
print(string_representation[:500])

{
  "lc": 1,
  "type": "constructor",
  "id": [
    "langchain",
    "schema",
    "runnable",
    "RunnableSequence"
  ],
  "kwargs": {
    "first": {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "prompts",
        "chat",
        "ChatPromptTemplate"
      ],
      "kwargs": {
        "input_variables": [
          "language",
          "text"
        ],
        "messages": [
          {
            "lc": 1,
            "type": "constructor",

转为可json序列化的Python字典

python 复制代码

dict_representation = dumpd(chain)

print(type(dict_representation))

<class 'dict'>

转存到磁盘

python 复制代码

import json

with open("/tmp/chain.json", "w") as fp:
    json.dump(string_representation, fp)

请注意，API密钥在序列化表示中被隐藏。被视为秘密的参数由LangChain对象的.lc_secrets属性指定：

python 复制代码

chain.last.lc_secrets

{'openai_api_key': 'OPENAI_API_KEY'}

加载对象

在load和loads中指定secrets_map将把相应的秘密加载到反序列化的LangChain对象上。

从字符串

python 复制代码

chain = loads(string_representation, secrets_map={"OPENAI_API_KEY": "llm-api-key"})

从字典

python 复制代码

chain = load(dict_representation, secrets_map={"OPENAI_API_KEY": "llm-api-key"})

从磁盘

python 复制代码

with open("/tmp/chain.json", "r") as fp:
    chain = loads(json.load(fp), secrets_map={"OPENAI_API_KEY": "llm-api-key"})

请注意，我们恢复了在指南开始时指定的API密钥：

python 复制代码

chain.last.openai_api_key.get_secret_value()

'llm-api-key'

langChainv0.3学习笔记（高级篇）

目录

工具

创建工具

从函数创建工具

@tool 装饰器

结构化工具

从可运行对象创建工具

子类化 BaseTool

如何创建异步工具

处理工具错误

返回工具执行的artifact

使用内置工具和工具包

自定义默认工具

如何使用内置工具包

使用聊天模型调用工具

定义工具模式

Python 函数

LangChain 工具

Pydantic 模型

TypedDict（需 langchain-core >= 0.2.25）

工具调用

解析

将工具输出传递给聊天模型

将运行时值传递给工具

隐藏模型参数

在运行时注入参数

为工具添加人机协作

添加人工审批

处理工具错误

尝试/异常工具调用

回退

带异常重试

强制模型调用工具

禁用并行工具调用

从工具访问 RunnableConfig

从工具中流式传输事件

从工具返回artifact（工具产物（artifact）与模型输出分离机制）

定义可区分输出的工具

使用 ToolCall 调用工具

与模型配合使用

使用 BaseTool 自定义类工具

将可运行对象转换为工具

基本用法

将工具集成至代理（Agent）

为大型语言模型和聊天模型添加临时工具调用能力

添加输出解析器

调用工具

返回工具输入

✅ 添加工具 vs 添加临时工具：核心区别

将运行时机密传递给可运行对象

聊天模型&大型语言模型

使用聊天模型调用工具

从模型返回结构化数据

.with_structured_output() 方法

pydantic

TypedDict 或 JSON Schema

在多个模式之间选择

流式处理

少量示例提示

指定结构化输出的方法（高级）

原始输出（高级）

直接提示和解析模型输出

使用 PydanticOutputParser

自定义解析

缓存聊天模型的响应

获取日志概率

创建自定义聊天模型类

流式传输聊天模型响应

🔁 同步流式输出示例

⚡ 异步流式输出示例

📡 Astream 事件监听

跟踪聊天模型中的令牌使用情况

流式处理

使用回调

处理速率限制

使用少量示例提示与工具调用

绑定特定模型的工具

在本地运行模型

快速入门

TypedDict（需 `langchain-core >= 0.2.25`）

使用 `BaseTool` 自定义类工具

使用 `astream_events()` 观察事件