LangChain Models学习指南

前言

学习 LangChain 时，我们经常会遇到这些代码：

python 复制代码

model.invoke("你好")
model.stream("请介绍一下 LangChain")
model.bind_tools([get_weather])
model.with_structured_output(Movie)

它们看起来用途不同，实际上都围绕同一个核心对象展开：

LangChain Chat Model。

Chat Model 是 LangChain 对不同大模型提供商进行统一封装后得到的模型接口。

无论底层使用 OpenAI、Anthropic、Google Gemini、AWS Bedrock，还是其他受支持的提供商，我们都可以使用相似的方式完成：

普通问答；
多轮对话；
流式输出；
批量调用；
工具调用；
结构化输出；
多模态输入和输出；
推理内容读取；
Token 用量统计；
运行时模型切换。

本文将按照"先认识模型，再调用模型，最后使用模型高级能力"的顺序展开。

一、LangChain 中的 Model 是什么

1.1 大语言模型能做什么

大语言模型可以理解和生成自然语言，因此可以完成：

内容创作；
翻译；
摘要；
问答；
信息提取；
分类。

现在的模型通常不只生成文本，还可能支持以下能力。

能力	含义
Tool calling	请求调用数据库、搜索、计算等外部工具
Structured output	按指定的数据结构返回结果
Multimodality	处理图片、音频、视频等非文本内容
Reasoning	通过多步骤推理解决复杂问题

1.2 Model 是 Agent 的推理引擎

在 Agent 系统中，模型负责决定：

如何理解用户的问题；
是否需要调用工具；
调用哪个工具；
给工具传递哪些参数；
如何理解工具返回的结果；
何时结束处理并给出最终回答。

可以把 Agent 想象成一位能够执行任务的工作人员：

text 复制代码

Model = 大脑
Tools = 可以使用的工具
Agent loop = 思考、行动、观察结果、继续思考的工作流程

模型的能力会直接影响 Agent 的基础表现。例如，有些模型更擅长遵循复杂指令，有些模型更擅长推理，还有些模型拥有更大的上下文窗口。

1.3 LangChain 为什么要提供统一模型接口

不同提供商的原生 SDK 往往拥有不同的：

初始化方式；
参数名称；
消息格式；
返回值格式；
流式调用方式；
工具调用协议。

LangChain 在这些提供商之上提供统一的 Chat Model 接口。

因此，应用程序可以围绕相同的核心方法编写：

python 复制代码

response = model.invoke("你好")

切换提供商时，主要调整的是模型初始化部分，后面的业务调用方式通常可以继续复用。

二、Model 的两种使用方式

LangChain Models 官方文档给出了两种主要使用方式。

2.1 在 Agent 中使用

创建 Agent 时，把模型作为它的推理引擎：

python 复制代码

from langchain.agents import create_agent
from langchain.chat_models import init_chat_model

model = init_chat_model("openai:gpt-5.5")

agent = create_agent(
    model=model,
    tools=[],
)

此时，Agent 负责组织模型调用、工具执行和消息传递。

2.2 独立使用

如果任务只是一次问答、分类、摘要或提取信息，可以直接调用模型：

python 复制代码

from langchain.chat_models import init_chat_model

model = init_chat_model("openai:gpt-5.5")
response = model.invoke("用一句话解释什么是 LangChain。")

print(response.text)

两种场景使用的是同一套模型接口。

这意味着我们可以先从独立模型调用开始，之后再将同一个模型接入 Agent。

三、初始化 Chat Model

3.1 使用 `init_chat_model`

LangChain 官方推荐使用 init_chat_model 初始化 Chat Model：

python 复制代码

from langchain.chat_models import init_chat_model

model = init_chat_model("openai:gpt-5.5")

字符串可以写成：

text 复制代码

provider:model

例如：

text 复制代码

openai:gpt-5.5

其中：

openai 是模型提供商；
gpt-5.5 是模型名称。

也可以将二者分开传递：

python 复制代码

model = init_chat_model(
    model="gpt-5.5",
    model_provider="openai",
)

3.2 安装提供商集成

LangChain 通过独立的集成包连接不同模型提供商。

OpenAI：

bash 复制代码

pip install -U "langchain[openai]"

Anthropic：

bash 复制代码

pip install -U "langchain[anthropic]"

Google Gemini：

bash 复制代码

pip install -U "langchain[google-genai]"

AWS Bedrock：

bash 复制代码

pip install -U "langchain[aws]"

Hugging Face：

bash 复制代码

pip install -U "langchain[huggingface]"

OpenRouter：

bash 复制代码

pip install -U langchain-openrouter

3.3 使用环境变量保存 API Key

以 OpenAI 为例：

python 复制代码

import os

from langchain.chat_models import init_chat_model

os.environ["OPENAI_API_KEY"] = "sk-..."

model = init_chat_model("openai:gpt-5.5")

实际项目通常在程序外设置环境变量，让模型集成自动读取。

PowerShell 示例：

powershell 复制代码

$env:OPENAI_API_KEY="sk-..."
python app.py

3.4 不同提供商的初始化示例

Anthropic：

python 复制代码

import os

from langchain.chat_models import init_chat_model

os.environ["ANTHROPIC_API_KEY"] = "sk-..."

model = init_chat_model("anthropic:claude-sonnet-4-6")

Google Gemini：

python 复制代码

import os

from langchain.chat_models import init_chat_model

os.environ["GOOGLE_API_KEY"] = "..."

model = init_chat_model("google_genai:gemini-2.5-flash-lite")

AWS Bedrock：

python 复制代码

from langchain.chat_models import init_chat_model

model = init_chat_model(
    "anthropic.claude-3-5-sonnet-20240620-v1:0",
    model_provider="bedrock_converse",
)

Hugging Face：

python 复制代码

import os

from langchain.chat_models import init_chat_model

os.environ["HUGGINGFACEHUB_API_TOKEN"] = "hf_..."

model = init_chat_model(
    "microsoft/Phi-3-mini-4k-instruct",
    model_provider="huggingface",
    temperature=0.7,
    max_tokens=1024,
)

OpenRouter：

python 复制代码

import os

from langchain.chat_models import init_chat_model

os.environ["OPENROUTER_API_KEY"] = "sk-..."

model = init_chat_model(
    "auto",
    model_provider="openrouter",
)

LangChain 将模型名称传给对应提供商，所以提供商发布新模型后，通常不需要等待 LangChain 核心包增加一个同名类。

四、模型的常用参数

不同提供商支持的完整参数并不完全相同，但官方页面列出了几个常用参数。

python 复制代码

from langchain.chat_models import init_chat_model

model = init_chat_model(
    "anthropic:claude-sonnet-4-6",
    temperature=0.7,
    max_tokens=1000,
    timeout=30,
    max_retries=6,
)

4.1 `model`

指定具体模型名称：

python 复制代码

model = init_chat_model(
    model="gpt-5.5",
    model_provider="openai",
)

4.2 `api_key`

用于向模型提供商证明调用者身份。

它既可以通过环境变量提供，也可以作为初始化参数传入：

python 复制代码

model = init_chat_model(
    model="gpt-5.5",
    model_provider="openai",
    api_key="YOUR_API_KEY",
)

4.3 `temperature`

temperature 用来调节输出的随机程度。

python 复制代码

model = init_chat_model(
    "openai:gpt-5.5",
    temperature=0,
)

可以这样理解：

text 复制代码

较低 temperature → 输出更稳定、更确定
较高 temperature → 输出更多样、更有随机性

4.4 `max_tokens`

max_tokens 限制模型最多生成多少 Token：

python 复制代码

model = init_chat_model(
    "openai:gpt-5.5",
    max_tokens=500,
)

它主要控制输出长度，不代表输入上下文窗口的大小。

4.5 `timeout`

timeout 表示等待单次模型请求的最长时间，单位为秒：

python 复制代码

model = init_chat_model(
    "openai:gpt-5.5",
    timeout=60,
)

4.6 `max_retries`

max_retries 控制请求失败后的最大重试次数：

python 复制代码

model = init_chat_model(
    "openai:gpt-5.5",
    max_retries=10,
)

LangChain Chat Model 默认最多重试 6 次，并使用带随机抖动的指数退避策略。

自动重试通常针对：

网络错误；
限流错误，也就是 HTTP 429；
服务端错误，也就是 HTTP 5xx。

身份认证失败等 HTTP 401 错误，以及资源不存在等 HTTP 404 错误，不属于自动重试可以解决的问题。

五、三个核心调用方法

LangChain Chat Model 最重要的三个调用方法是：

方法	用途	返回特点
`invoke()`	单次调用	等完整结果生成后返回
`stream()`	流式调用	持续返回文本或内容片段
`batch()`	批量调用	一次提交多个独立输入

先记住这条主线：

text 复制代码

一个输入，等完整结果      → invoke()
一个输入，边生成边显示    → stream()
多个互相独立的输入        → batch()

六、使用 `invoke()` 完成一次模型调用

6.1 直接传入字符串

python 复制代码

from langchain.chat_models import init_chat_model

model = init_chat_model("openai:gpt-5.5")

response = model.invoke("为什么鹦鹉的羽毛颜色很丰富？")

print(response)
print(response.text)

Chat Model 的返回值通常是 AIMessage，不是普通字符串。

其中：

python 复制代码

response.text

用于读取文本内容。

6.2 传入字典消息列表

多轮对话可以表示为消息列表：

python 复制代码

conversation = [
    {
        "role": "system",
        "content": "你是一位中英翻译助手。",
    },
    {
        "role": "user",
        "content": "Translate: I love programming.",
    },
    {
        "role": "assistant",
        "content": "我喜欢编程。",
    },
    {
        "role": "user",
        "content": "Translate: I love building applications.",
    },
]

response = model.invoke(conversation)
print(response.text)

消息中的 role 表示消息来源：

role	含义
`system`	系统指令，定义模型行为
`user`	用户输入
`assistant`	模型以前的回答

6.3 使用消息对象

LangChain 也提供了对应的消息类：

python 复制代码

from langchain.messages import AIMessage, HumanMessage, SystemMessage

conversation = [
    SystemMessage("你是一位中英翻译助手。"),
    HumanMessage("Translate: I love programming."),
    AIMessage("我喜欢编程。"),
    HumanMessage("Translate: I love building applications."),
]

response = model.invoke(conversation)
print(response.text)

字典消息和消息对象表达的是同一类对话信息。

七、使用 `stream()` 实时接收输出

7.1 为什么需要流式输出

invoke() 会等待模型生成完整回答后一次性返回。

当回答较长时，用户需要等待一段时间才能看到内容。

stream() 会把生成过程拆成许多小片段，模型生成一点，程序就显示一点。

7.2 基础流式输出

python 复制代码

from langchain.chat_models import init_chat_model

model = init_chat_model("openai:gpt-5.5")

for chunk in model.stream("为什么鹦鹉的羽毛颜色很丰富？"):
    print(chunk.text, end="", flush=True)

每个 chunk 通常是一个 AIMessageChunk。

可以把它理解成：

text 复制代码

AIMessage      = 完整包裹
AIMessageChunk = 包裹中的一小片内容

7.3 合并流式片段

流式片段支持使用加法合并：

python 复制代码

full = None

for chunk in model.stream("天空通常是什么颜色？"):
    full = chunk if full is None else full + chunk
    print(full.text)

print(full.content_blocks)

循环结束后，full 就是合并后的完整消息。

合并后的消息可以：

加入对话历史；
读取完整文本；
查看内容块；
读取工具调用；
查看响应元数据。

7.4 使用 `astream_events()` 观察语义事件

除了直接读取文本片段，还可以监听模型运行事件：

python 复制代码

import asyncio

from langchain.chat_models import init_chat_model


async def main() -> None:
    model = init_chat_model("openai:gpt-5.5")

    async for event in model.astream_events("你好"):
        event_type = event["event"]

        if event_type == "on_chat_model_start":
            print(f"输入：{event['data']['input']}")

        elif event_type == "on_chat_model_stream":
            print(
                event["data"]["chunk"].text,
                end="",
                flush=True,
            )

        elif event_type == "on_chat_model_end":
            print(f"\n完整回答：{event['data']['output'].text}")


asyncio.run(main())

三个事件对应：

text 复制代码

on_chat_model_start  → 模型调用开始
on_chat_model_stream → 模型正在生成内容
on_chat_model_end    → 模型调用结束

7.5 自动流式处理

在 LangGraph Agent 等整体流式运行环境中，即使某个节点内部调用的是：

python 复制代码

model.invoke(...)

LangChain 也可能自动切换到底层流式模式，并通过回调系统持续产生新 Token 事件。

对于调用 invoke() 的那段业务代码来说，它最终拿到的仍然是完整结果；对于外层流式系统来说，中间的生成过程可以被实时观察。

八、使用 `batch()` 批量处理输入

8.1 批量调用

如果多个问题彼此独立，可以使用 batch()：

python 复制代码

from langchain.chat_models import init_chat_model

model = init_chat_model("openai:gpt-5.5")

responses = model.batch(
    [
        "为什么鹦鹉的羽毛颜色很丰富？",
        "飞机为什么能够飞行？",
        "什么是量子计算？",
    ]
)

for response in responses:
    print(response.text)

LangChain 的 batch() 会在客户端并行执行模型调用。

它与某些提供商自己提供的离线 Batch API 不是同一个概念。

8.2 按完成顺序接收结果

batch() 通常等待整个批次完成后返回。

如果希望哪个输入先完成就先处理哪个结果，可以使用：

python 复制代码

inputs = [
    "为什么鹦鹉的羽毛颜色很丰富？",
    "飞机为什么能够飞行？",
    "什么是量子计算？",
]

for index, response in model.batch_as_completed(inputs):
    print(f"输入序号：{index}")
    print(f"回答：{response.text}")

返回结果可能不是原始输入顺序，因此每项结果同时带有输入索引。

8.3 控制最大并发数

python 复制代码

responses = model.batch(
    inputs,
    config={
        "max_concurrency": 5,
    },
)

max_concurrency=5 表示同时最多执行 5 个模型请求。

九、Tool Calling：让模型请求调用工具

9.1 Tool Calling 的含义

工具由两部分组成：

工具的描述和参数结构；
真正执行任务的函数或协程。

模型不会因为看到一个 Python 函数就自动执行它。

首先需要把工具绑定到模型：

python 复制代码

model_with_tools = model.bind_tools([get_weather])

绑定后，模型可以根据用户问题决定是否生成工具调用请求。

"Tool calling"和"Function calling"通常表达相同的含义。

9.2 定义并绑定工具

python 复制代码

from langchain.chat_models import init_chat_model
from langchain.tools import tool


@tool
def get_weather(location: str) -> str:
    """查询指定地点的天气。"""
    return f"{location} 天气晴朗，温度 25℃。"


model = init_chat_model("openai:gpt-5.5")
model_with_tools = model.bind_tools([get_weather])

response = model_with_tools.invoke("北京今天天气怎么样？")

for tool_call in response.tool_calls:
    print(f"工具名称：{tool_call['name']}")
    print(f"工具参数：{tool_call['args']}")

模型的回答中可能包含：

python 复制代码

{
    "name": "get_weather",
    "args": {"location": "北京"},
    "id": "call_123",
    "type": "tool_call",
}

这表示模型提出了工具调用请求。

9.3 完整的工具执行循环

单独使用 Chat Model 时，需要程序负责执行工具，并把结果交还给模型。

python 复制代码

from langchain.chat_models import init_chat_model
from langchain.tools import tool


@tool
def get_weather(location: str) -> str:
    """查询指定地点的天气。"""
    return f"{location} 天气晴朗，温度 25℃。"


model = init_chat_model("openai:gpt-5.5")
model_with_tools = model.bind_tools([get_weather])

# 第一步：把用户消息交给模型。
messages = [
    {
        "role": "user",
        "content": "北京今天天气怎么样？",
    }
]

# 第二步：模型生成工具调用请求。
ai_message = model_with_tools.invoke(messages)
messages.append(ai_message)

# 第三步：程序执行模型请求的工具。
for tool_call in ai_message.tool_calls:
    tool_result = get_weather.invoke(tool_call)
    messages.append(tool_result)

# 第四步：模型根据工具结果生成自然语言回答。
final_response = model_with_tools.invoke(messages)
print(final_response.text)

工具执行后产生的 ToolMessage 带有 tool_call_id，它与模型原始工具调用中的 ID 对应。

模型因此能够知道：

text 复制代码

这条工具结果属于哪一次工具调用。

Agent 会自动组织这套工具执行循环；单独调用模型时，则由应用程序组织。

9.4 指定工具选择方式

默认情况下，模型自行决定是否调用工具。

可以要求模型必须调用某个可用工具：

python 复制代码

model_with_tools = model.bind_tools(
    [get_weather],
    tool_choice="any",
)

具体可用的 tool_choice 配置取决于模型提供商。

9.5 并行工具调用

一个问题可能同时涉及多个彼此独立的地点：

python 复制代码

response = model_with_tools.invoke(
    "北京和东京今天天气分别怎么样？"
)

print(response.tool_calls)

支持并行工具调用的模型可能一次返回多个调用：

python 复制代码

[
    {
        "name": "get_weather",
        "args": {"location": "北京"},
        "id": "call_1",
    },
    {
        "name": "get_weather",
        "args": {"location": "东京"},
        "id": "call_2",
    },
]

部分提供商允许关闭并行工具调用：

python 复制代码

model_with_tools = model.bind_tools(
    [get_weather],
    parallel_tool_calls=False,
)

9.6 流式读取工具调用

工具名称、ID 和参数也可以分片生成：

python 复制代码

for chunk in model_with_tools.stream(
    "北京和东京今天天气分别怎么样？"
):
    for tool_chunk in chunk.tool_call_chunks:
        if name := tool_chunk.get("name"):
            print(f"工具：{name}")

        if tool_call_id := tool_chunk.get("id"):
            print(f"调用 ID：{tool_call_id}")

        if args := tool_chunk.get("args"):
            print(f"参数片段：{args}")

也可以先合并全部消息片段，再读取完整工具调用：

python 复制代码

gathered = None

for chunk in model_with_tools.stream("北京今天天气怎么样？"):
    gathered = chunk if gathered is None else gathered + chunk

print(gathered.tool_calls)

十、Structured Output：让模型返回固定结构

10.1 为什么需要结构化输出

普通模型回答可能是：

text 复制代码

《盗梦空间》上映于 2010 年，导演是克里斯托弗·诺兰......

人可以轻松读懂，但程序还需要从自然语言中提取字段。

结构化输出可以让返回结果直接符合指定 Schema，例如：

python 复制代码

Movie(
    title="Inception",
    year=2010,
    director="Christopher Nolan",
    rating=8.8,
)

LangChain Models 页面展示了三种 Schema：

Pydantic；
TypedDict；
JSON Schema。

10.2 使用 Pydantic

Pydantic 支持字段说明、嵌套结构和运行时验证。

python 复制代码

from pydantic import BaseModel, Field


class Movie(BaseModel):
    """电影信息。"""

    title: str = Field(description="电影名称")
    year: int = Field(description="上映年份")
    director: str = Field(description="导演姓名")
    rating: float = Field(description="十分制评分")


model_with_structure = model.with_structured_output(Movie)

response = model_with_structure.invoke(
    "请提供电影《盗梦空间》的详细信息。"
)

print(response)
print(response.title)
print(response.year)

返回值是 Movie 对象。

10.3 使用 TypedDict

TypedDict 更轻量，返回结果是普通字典：

python 复制代码

from typing_extensions import Annotated, TypedDict


class MovieDict(TypedDict):
    """电影信息。"""

    title: Annotated[str, ..., "电影名称"]
    year: Annotated[int, ..., "上映年份"]
    director: Annotated[str, ..., "导演姓名"]
    rating: Annotated[float, ..., "十分制评分"]


model_with_structure = model.with_structured_output(MovieDict)

response = model_with_structure.invoke(
    "请提供电影《盗梦空间》的详细信息。"
)

print(response["title"])
print(response["year"])

10.4 使用 JSON Schema

JSON Schema 适合需要跨语言或跨系统共享数据结构的场景：

python 复制代码

movie_schema = {
    "title": "Movie",
    "description": "电影信息",
    "type": "object",
    "properties": {
        "title": {
            "type": "string",
            "description": "电影名称",
        },
        "year": {
            "type": "integer",
            "description": "上映年份",
        },
        "director": {
            "type": "string",
            "description": "导演姓名",
        },
        "rating": {
            "type": "number",
            "description": "十分制评分",
        },
    },
    "required": [
        "title",
        "year",
        "director",
        "rating",
    ],
}

model_with_structure = model.with_structured_output(
    movie_schema,
    method="json_schema",
)

response = model_with_structure.invoke(
    "请提供电影《盗梦空间》的详细信息。"
)

print(response)

10.5 结构化输出方法

提供商可能支持不同的结构化输出实现方式。

method	含义
`json_schema`	使用提供商原生的结构化输出能力
`function_calling`	将 Schema 转换成工具调用结构
`json_mode`	要求返回合法 JSON，并在提示词中描述 Schema

具体支持哪些方式，由模型提供商和模型能力决定。

10.6 同时获取解析结果和原始消息

python 复制代码

model_with_structure = model.with_structured_output(
    Movie,
    include_raw=True,
)

response = model_with_structure.invoke(
    "请提供电影《盗梦空间》的详细信息。"
)

print(response["raw"])
print(response["parsed"])
print(response["parsing_error"])

返回结果包含：

text 复制代码

raw           → 原始 AIMessage
parsed        → 解析后的结构化对象
parsing_error → 解析异常；成功时通常为 None

10.7 嵌套结构

python 复制代码

from pydantic import BaseModel, Field


class Actor(BaseModel):
    """演员信息。"""

    name: str
    role: str


class MovieDetails(BaseModel):
    """包含演员表的电影信息。"""

    title: str
    year: int
    cast: list[Actor]
    genres: list[str]
    budget: float | None = Field(
        default=None,
        description="预算，单位为百万美元",
    )


model_with_structure = model.with_structured_output(MovieDetails)
response = model_with_structure.invoke(
    "请整理电影《盗梦空间》的信息和主要演员。"
)

print(response.cast)

十一、Model Profile：读取模型能力信息

Model Profile 需要 langchain>=1.1。

部分 Chat Model 通过 profile 属性提供能力信息：

python 复制代码

print(model.profile)

可能得到：

python 复制代码

{
    "max_input_tokens": 400000,
    "image_inputs": True,
    "reasoning_output": True,
    "tool_calling": True,
}

这些字段可以帮助程序判断：

模型最大输入长度；
是否支持图片输入；
是否支持推理内容；
是否支持工具调用；
是否支持结构化输出。

应用程序可以根据 Profile 动态调整行为，例如：

根据上下文窗口决定何时压缩消息；
根据模型能力选择结构化输出策略；
根据输入模态决定是否接受图片；
筛选支持工具调用的模型。

11.1 自定义 Profile

python 复制代码

from langchain.chat_models import init_chat_model

custom_profile = {
    "max_input_tokens": 100_000,
    "tool_calling": True,
    "structured_output": True,
}

model = init_chat_model(
    "...",
    profile=custom_profile,
)

11.2 复制并更新 Profile

python 复制代码

new_profile = model.profile | {
    "tool_calling": True,
}

new_model = model.model_copy(
    update={"profile": new_profile},
)

Model Profile 当前属于 Beta 功能，其格式以后可能发生变化。

十二、多模态模型

部分模型不仅能处理文字，还可以处理：

图片；
音频；
视频。

LangChain 使用内容块表达不同类型的数据。

如果模型能够生成多模态结果，可以通过 content_blocks 读取：

python 复制代码

response = model.invoke("生成一张猫的图片")

print(response.content_blocks)

结果可能类似：

python 复制代码

[
    {
        "type": "text",
        "text": "这是一张猫的图片。",
    },
    {
        "type": "image",
        "base64": "...",
        "mime_type": "image/jpeg",
    },
]

content_blocks 把文本、图片和其他内容统一表示为带有 type 的数据块。

十三、推理内容

一些模型可以在回答复杂问题时执行多步骤推理。

如果底层模型支持输出推理内容，可以在流式结果的 content_blocks 中查找：

python 复制代码

for chunk in model.stream(
    "为什么鹦鹉拥有颜色丰富的羽毛？"
):
    reasoning_steps = [
        block
        for block in chunk.content_blocks
        if block["type"] == "reasoning"
    ]

    if reasoning_steps:
        print(reasoning_steps)
    else:
        print(chunk.text, end="", flush=True)

不同模型对推理强度的配置方式并不相同。

一些模型使用：

text 复制代码

low / high

这类等级参数；另一些模型使用推理 Token 预算。

十四、本地模型

LangChain 支持在本地硬件上运行模型。

本地模型适合：

数据隐私要求较高；
需要运行自定义模型；
希望减少云模型调用成本。

官方 Models 页面将 Ollama 作为较容易使用的本地 Chat Model 和 Embedding Model 运行方式之一。

本地模型接入 LangChain 后，依然可以尽量使用统一的 Chat Model 调用形式：

python 复制代码

response = model.invoke("你好")

模型具体支持的工具调用、结构化输出和多模态能力，取决于所运行的本地模型及对应集成。

十五、Prompt Caching

Prompt Caching 用于复用重复输入内容的处理结果，从而降低延迟和费用。

官方页面将缓存方式分为三个层次。

15.1 提供商隐式缓存

部分提供商会自动识别重复内容，不需要应用额外配置。

15.2 提供商显式缓存控制

部分提供商允许程序明确标记缓存位置或提供缓存键。

这些配置会映射到底层提供商 API。

15.3 Agent 中间件缓存

在 Agent 场景中，中间件可以帮助缓存相对稳定的：

系统提示词；
工具定义。

缓存是否生效通常还与输入 Token 是否达到提供商规定的最低阈值有关。

缓存使用情况可能体现在模型响应的 usage metadata 中。

十六、服务端工具调用

前面介绍的普通 Tool Calling 属于客户端工具调用：

text 复制代码

模型请求调用工具
    ↓
你的 Python 程序执行工具
    ↓
把 ToolMessage 交回模型

部分提供商支持服务端工具调用，例如：

Web 搜索；
代码解释器；
其他由提供商托管的工具。

服务端工具由模型提供商在一次对话调用内部执行：

python 复制代码

from langchain.chat_models import init_chat_model

model = init_chat_model("openai:gpt-5.4-mini")

web_search = {
    "type": "web_search",
}

model_with_tools = model.bind_tools([web_search])

response = model_with_tools.invoke(
    "今天有什么积极的新闻？"
)

print(response.content_blocks)

返回的内容块中可能出现：

text 复制代码

server_tool_call   → 服务端开始调用工具
server_tool_result → 服务端工具返回结果
text               → 模型基于结果生成的回答

这种工具调用发生在提供商服务器端，因此同一次对话返回中不需要像客户端工具调用那样，再由应用程序传入对应的 ToolMessage。

十七、速率限制

模型提供商通常会限制一段时间内允许的请求数量。

LangChain 提供了可选的 InMemoryRateLimiter：

python 复制代码

from langchain.chat_models import init_chat_model
from langchain_core.rate_limiters import InMemoryRateLimiter

rate_limiter = InMemoryRateLimiter(
    requests_per_second=0.1,
    check_every_n_seconds=0.1,
    max_bucket_size=10,
)

model = init_chat_model(
    model="gpt-5.5",
    model_provider="openai",
    rate_limiter=rate_limiter,
)

参数含义：

参数	含义
`requests_per_second`	每秒允许的平均请求数
`check_every_n_seconds`	等待请求许可时的检查间隔
`max_bucket_size`	允许积累的最大突发请求额度

示例中的：

python 复制代码

requests_per_second=0.1

表示平均每 10 秒允许一次请求。

这个 Rate Limiter 控制的是请求次数，不按每次请求的 Token 数量限流。

十八、自定义 Base URL 与代理

18.1 自定义 Base URL

对于实现了 OpenAI Chat Completions 兼容协议的服务，可以配置：

python 复制代码

from langchain.chat_models import init_chat_model

model = init_chat_model(
    model="MODEL_NAME",
    model_provider="openai",
    base_url="BASE_URL",
    api_key="YOUR_API_KEY",
)

这里：

model_provider="openai" 表示使用 OpenAI 接口形式的 LangChain 集成；
base_url 指向实际提供模型服务的地址；
model 是该服务接受的模型名称。

对于 OpenRouter 和 LiteLLM，官方页面建议优先使用各自的专用 LangChain 集成。

18.2 HTTP 代理

部分模型集成支持 HTTP 代理。

以 ChatOpenAI 为例：

python 复制代码

from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-5.5",
    openai_proxy="http://proxy.example.com:8080",
)

代理参数名称和支持程度由具体模型集成决定。

十九、读取 Log Probabilities

部分模型可以返回每个 Token 的对数概率。

它用于表示模型生成某个 Token 时对应的概率信息。

python 复制代码

from langchain.chat_models import init_chat_model

model = init_chat_model(
    model="gpt-5.5",
    model_provider="openai",
).bind(logprobs=True)

response = model.invoke("为什么鹦鹉会说话？")

print(response.response_metadata["logprobs"])

是否支持 logprobs 由具体模型和提供商决定。

二十、统计 Token 使用量

部分提供商会在模型响应中返回 Token 用量。

单次响应可以查看：

python 复制代码

response = model.invoke("你好")

print(response.usage_metadata)

可能包含：

python 复制代码

{
    "input_tokens": 8,
    "output_tokens": 10,
    "total_tokens": 18,
}

20.1 使用 Callback Handler 聚合用量

python 复制代码

from langchain.chat_models import init_chat_model
from langchain_core.callbacks import UsageMetadataCallbackHandler

model_1 = init_chat_model(model="gpt-5.4-mini")
model_2 = init_chat_model(model="claude-haiku-4-5-20251001")

callback = UsageMetadataCallbackHandler()

model_1.invoke(
    "你好",
    config={"callbacks": [callback]},
)
model_2.invoke(
    "你好",
    config={"callbacks": [callback]},
)

print(callback.usage_metadata)

20.2 使用上下文管理器聚合用量

python 复制代码

from langchain.chat_models import init_chat_model
from langchain_core.callbacks import get_usage_metadata_callback

model_1 = init_chat_model(model="gpt-5.4-mini")
model_2 = init_chat_model(model="claude-haiku-4-5-20251001")

with get_usage_metadata_callback() as callback:
    model_1.invoke("你好")
    model_2.invoke("你好")

    print(callback.usage_metadata)

上下文管理器范围内发生的模型调用会被统一统计。

二十一、调用时传入 `config`

模型初始化参数控制模型本身，而 invoke() 的 config 参数控制当前这次运行。

python 复制代码

response = model.invoke(
    "讲一个笑话。",
    config={
        "run_name": "joke_generation",
        "tags": ["humor", "demo"],
        "metadata": {
            "user_id": "123",
        },
        "callbacks": [my_callback_handler],
    },
)

官方页面列出的常用配置包括：

配置	含义
`run_name`	当前调用的名称
`tags`	用于筛选和组织运行记录的标签
`metadata`	随调用传递的自定义元数据
`max_concurrency`	批处理时最大并发数
`callbacks`	监听模型执行事件的回调
`recursion_limit`	复杂调用流程允许的最大递归深度

这些信息可以用于：

跟踪一次模型调用；
组织日志和监控数据；
使用回调处理运行事件；
控制批处理资源使用；
在复杂流程中标记业务上下文。

二十二、运行时可配置模型

22.1 调用时选择模型

创建模型时不指定具体模型：

python 复制代码

from langchain.chat_models import init_chat_model

configurable_model = init_chat_model(
    temperature=0,
)

此时 model 和 model_provider 默认可以通过运行配置提供。

第一次调用选择 GPT：

python 复制代码

response = configurable_model.invoke(
    "你叫什么名字？",
    config={
        "configurable": {
            "model": "gpt-5-nano",
        }
    },
)

下一次调用选择 Claude：

python 复制代码

response = configurable_model.invoke(
    "你叫什么名字？",
    config={
        "configurable": {
            "model": "claude-sonnet-4-6",
        }
    },
)

同一个可配置模型对象可以在不同调用中选择不同的底层模型。

22.2 指定哪些字段允许动态配置

python 复制代码

first_model = init_chat_model(
    model="gpt-5.4-mini",
    temperature=0,
    configurable_fields=(
        "model",
        "model_provider",
        "temperature",
        "max_tokens",
    ),
    config_prefix="first",
)

不传运行时配置时，使用初始化时的默认值：

python 复制代码

response = first_model.invoke("你叫什么名字？")

调用时覆盖部分参数：

python 复制代码

response = first_model.invoke(
    "你叫什么名字？",
    config={
        "configurable": {
            "first_model": "claude-sonnet-4-6",
            "first_temperature": 0.5,
            "first_max_tokens": 100,
        }
    },
)

config_prefix="first" 会给动态参数增加 first_ 前缀。

当一条链中包含多个可配置模型时，可以使用不同前缀区分它们。

22.3 可配置模型也支持声明式操作

可配置模型同样可以继续调用：

python 复制代码

model.bind_tools(...)
model.with_structured_output(...)

也就是说，可以先定义模型需要具备的能力，再在每次运行时选择具体模型。

二十三、动态模型选择

可配置模型由调用者通过 config 明确选择模型。

动态模型选择则由程序根据当前状态自动决定使用哪个模型。

官方页面展示了通过 Agent 中间件实现的方式：

python 复制代码

from langchain.agents import create_agent
from langchain.agents.middleware import (
    ModelRequest,
    ModelResponse,
    wrap_model_call,
)
from langchain_openai import ChatOpenAI

basic_model = ChatOpenAI(model="gpt-5.4-mini")
advanced_model = ChatOpenAI(model="gpt-5.5")


@wrap_model_call
def dynamic_model_selection(
    request: ModelRequest,
    handler,
) -> ModelResponse:
    """根据对话长度选择模型。"""
    message_count = len(request.state["messages"])

    if message_count > 10:
        selected_model = advanced_model
    else:
        selected_model = basic_model

    return handler(
        request.override(model=selected_model)
    )


agent = create_agent(
    model=basic_model,
    tools=tools,
    middleware=[dynamic_model_selection],
)

这段代码表达的策略是：

text 复制代码

消息数量不超过 10 条 → 使用基础模型
消息数量超过 10 条   → 使用更强的模型

动态选择依据还可以来自当前状态和运行上下文，从而实现模型路由和成本控制。

使用动态模型选择配合结构化输出时，传给中间件的模型不能是已经提前调用过 bind_tools() 的预绑定模型。

二十四、把整套知识串起来

LangChain Chat Model 的学习主线可以归纳为四层。

第一层：创建模型

python 复制代码

model = init_chat_model(
    "openai:gpt-5.5",
    temperature=0,
    timeout=60,
    max_retries=6,
)

第二层：选择调用方式

python 复制代码

model.invoke(...)
model.stream(...)
model.batch(...)

第三层：给模型增加输出能力

工具调用：

python 复制代码

model_with_tools = model.bind_tools([get_weather])

结构化输出：

python 复制代码

model_with_structure = model.with_structured_output(Movie)

其他能力：

python 复制代码

model.profile
response.content_blocks
response.usage_metadata

第四层：控制运行过程

python 复制代码

model.invoke(
    "你好",
    config={
        "run_name": "demo",
        "tags": ["learning"],
        "metadata": {"chapter": 1},
    },
)

还可以通过可配置模型或 Agent 中间件，在运行时切换模型。

二十五、核心知识总结

25.1 模型接口

LangChain 使用统一 Chat Model 接口连接多个模型提供商。

模型既可以单独调用，也可以作为 Agent 的推理引擎。

25.2 初始化

推荐入口：

python 复制代码

from langchain.chat_models import init_chat_model

模型可以使用：

text 复制代码

provider:model

格式指定。

25.3 调用方法

text 复制代码

invoke()             → 等待完整回答
stream()             → 实时接收回答片段
batch()              → 批量并行处理
batch_as_completed() → 按完成顺序处理批量结果

25.4 模型返回值

Chat Model 通常返回 AIMessage。

常用数据包括：

python 复制代码

response.text
response.content_blocks
response.tool_calls
response.usage_metadata
response.response_metadata

25.5 模型能力扩展

python 复制代码

model.bind_tools(...)
model.with_structured_output(...)
model.bind(...)

分别用于绑定工具、约束输出结构和绑定其他模型参数。

25.6 运行控制

模型初始化参数用于设置模型行为。

调用时的 config 用于设置：

运行名称；
标签；
元数据；
回调；
最大并发数；
递归限制；
运行时可配置字段。

25.7 高级能力

官方 Models 页面还介绍了：

Model Profile；
多模态；
推理内容；
本地模型；
Prompt Caching；
服务端工具调用；
Rate Limiting；
Base URL 与代理；
Log Probabilities；
Token 用量统计；
可配置模型；
动态模型选择。

LangChain Models学习指南

前言

一、LangChain 中的 Model 是什么

1.1 大语言模型能做什么

1.2 Model 是 Agent 的推理引擎

1.3 LangChain 为什么要提供统一模型接口

二、Model 的两种使用方式

2.1 在 Agent 中使用

2.2 独立使用

三、初始化 Chat Model

3.1 使用 init_chat_model

3.2 安装提供商集成

3.3 使用环境变量保存 API Key

3.4 不同提供商的初始化示例

四、模型的常用参数

4.1 model

4.2 api_key

4.3 temperature

4.4 max_tokens

4.5 timeout

4.6 max_retries

五、三个核心调用方法

六、使用 invoke() 完成一次模型调用

6.1 直接传入字符串

6.2 传入字典消息列表

6.3 使用消息对象

七、使用 stream() 实时接收输出

7.1 为什么需要流式输出

7.2 基础流式输出

7.3 合并流式片段

7.4 使用 astream_events() 观察语义事件

7.5 自动流式处理

八、使用 batch() 批量处理输入

8.1 批量调用

8.2 按完成顺序接收结果

8.3 控制最大并发数

九、Tool Calling：让模型请求调用工具

9.1 Tool Calling 的含义

9.2 定义并绑定工具

9.3 完整的工具执行循环

9.4 指定工具选择方式

9.5 并行工具调用

9.6 流式读取工具调用

十、Structured Output：让模型返回固定结构

10.1 为什么需要结构化输出

10.2 使用 Pydantic

10.3 使用 TypedDict

10.4 使用 JSON Schema

10.5 结构化输出方法

10.6 同时获取解析结果和原始消息

10.7 嵌套结构

十一、Model Profile：读取模型能力信息

11.1 自定义 Profile

11.2 复制并更新 Profile

十二、多模态模型

十三、推理内容

十四、本地模型

十五、Prompt Caching

15.1 提供商隐式缓存

15.2 提供商显式缓存控制

15.3 Agent 中间件缓存

十六、服务端工具调用

十七、速率限制

十八、自定义 Base URL 与代理

18.1 自定义 Base URL

18.2 HTTP 代理

十九、读取 Log Probabilities

二十、统计 Token 使用量

20.1 使用 Callback Handler 聚合用量

20.2 使用上下文管理器聚合用量

二十一、调用时传入 config

二十二、运行时可配置模型

22.1 调用时选择模型

22.2 指定哪些字段允许动态配置

22.3 可配置模型也支持声明式操作

二十三、动态模型选择

二十四、把整套知识串起来

第一层：创建模型

第二层：选择调用方式

第三层：给模型增加输出能力

3.1 使用 `init_chat_model`

4.1 `model`

4.2 `api_key`

4.3 `temperature`

4.4 `max_tokens`

4.5 `timeout`

4.6 `max_retries`

六、使用 `invoke()` 完成一次模型调用

七、使用 `stream()` 实时接收输出

7.4 使用 `astream_events()` 观察语义事件

八、使用 `batch()` 批量处理输入

二十一、调用时传入 `config`