Langchain_v1.0|核心模块-模型Model

Langchain_v1.0|核心模块-模型Model

@模型

@基本用法

@初始化模型

[@Key methods](#@Key methods)

@Invoke

@Stream

@Batch

@参数设置说明

[@Stream 方法调用](#@Stream 方法调用)

[@Stream Event](#@Stream Event)astream\_events方法

@Batch

@工具调用

@工具调用流程

[@2026-01-15 00:58:00 结束时学习点](#@2026-01-15 00:58:00 结束时学习点)

@工具调用的常见方法

@工具执行循环

@强制工具调用

[@tools_choice="any" : 强制使用任何工具](#@tools_choice=“any” : 强制使用任何工具)

[@tools_choice="required" : 强制使用指定工具](#@tools_choice=“required” : 强制使用指定工具)

@并行调用多个工具

@流媒体工具调用

@结构化输出

@结构化输出的关键考虑事项

@高级主题

[@1 model profile | 模型简介](#@1 model profile | 模型简介)

@多模态模型

@推理

@本地模型

@提示缓存

@服务器端工具使用

@速率限制

@基础网址或代理

@代理配置

[@Token usage](#@Token usage)

@调用配置

@可配置模型

@使用声明式可配置模型

‍

模型

LLMs 是强大的人工智能工具，可以像人类一样解释和生成文本。它们足够多功能，可以写内容、翻译语言、总结和回答问题，而不需要为每个任务进行专门的训练。

除了文本生成，许多模型还支持：

工具调用 - 调用外部工具（如数据库查询或API调用）并在其响应中使用结果。
结构化输出 - 模型的响应受到限制，必须遵循定义的格式。
多模态 - 处理和返回文本以外的数据，例如图像、音频和视频。
推理 - 模型进行多步推理以得出结论。

模型是智能体的推理引擎。它们驱动智能体的决策过程，决定调用哪些工具，如何解释结果，并在何时提供最终答案。

您选择的模型的质量和能力直接影响您的智能体的基础可靠性和性能。

不同的模型在不同的任务上表现出色------有些擅长遵循复杂的指令，有些擅长结构化推理，还有些支持更大的上下文窗口以处理更多信息。

LangChain的标准模型接口为您提供了许多不同的提供商集成，这使得您可以轻松地尝试和切换模型，以找到最适合您的使用案例的模型。

基本用法

模型可以以两种方式使用：

使用代理 - 在创建代理时，模型可以动态指定。
独立式 - 模型可以直接调用（在代理循环之外）用于文本生成、分类或提取等任务，而无需代理框架。

相同的模型界面在两种情况下都适用，这使您能够根据需要从简单开始并扩展到更复杂的基于代理的工作流程。

初始化模型

在 LangChain 中使用独立模型的最简单方法是使用 init_chat_model 来从您选择的聊天模型提供商初始化一个（示例如下）：

python 复制代码

import profile
from langchain_core.messages import SystemMessage

from config import config
from langchain.chat_models import init_chat_model
custom_profile = {
    "max_input_tokens": 100_000,
    "tool_calling": True,
    "max_output_tokens": 12800,
}

model = init_chat_model(
    # model="deepseek-ai/DeepSeek-V3.2",
    # 代码模型 MiniMaxAI/MiniMax-M2  192K 上下文窗口 8元 /百万
    # Qwen3-Coder-30B-A3B-Instruct 30B 256K 上下文窗口 2.8元 /百万
    # model="MiniMaxAI/MiniMax-M2",
    model="Qwen/Qwen3-Coder-30B-A3B-Instruct",
    # model_provider 必须指定为 openai，否则会报错
    # 基于 openai 兼容 模型的参数
    model_provider="openai",
    # 模型基础 URL
    base_url=config.SCII_API_BASE_URL,
    # API 密钥
    api_key=config.SCII_API_KEY,
    # 温度 0.5 是默认值，范围 0-1，值越大，输出越随机
    temperature=0.5,
    # 超时时间 5 分钟
    timeout=60*5,
    # 最大输出 token 数 16000 是默认值，根据模型不同，可能会有差异
    max_tokens=12800,
    profile=custom_profile,
)

python 复制代码

def test_stream():
  chunks = []
  for chunk in model.stream("写一个冒泡算法，python语言"):
      chunks.append(chunk)
      print(chunk.text, end="",flush=True)

test_stream()

Key methods

Invoke

调用模型的最直接方式是使用 invoke() 方法，传入单个消息或消息列表。

Stream

大多数模型都可以在生成输出内容时流式传输。通过逐步显示输出，流式传输显著提高了用户体验，特别是对于较长的响应。

调用 stream() 方法返回一个迭代器，该迭代器会在输出内容生成时逐条 yielding 输出块。你可以使用循环来实时处理每个输出块：

与 invoke() 不同，stream() 方法返回多个 AIMessageChunk 对象，每个对象包含输出文本的一部分。重要的是，每个流中的块都可以通过求和操作合并成一个完整的消息。

生成的消息可以像用invoke()生成的消息一样进行处理 -- 例如，它可以被聚合到消息历史中，并作为对话上下文传递回模型。

Batch

参数设置说明

参数名	类型	是否必需	说明
`model`	字符串	是	指定要使用的具体模型名称或标识符。也可以使用 `:` 格式同时指定模型和提供商，例如 `'openai:o1'`。
`api_key`	字符串	否	用于向模型提供商进行身份验证的密钥。通常在注册服务时获取，常通过设置环境变量来提供。
`temperature`	数字	否	控制模型输出的随机性。数值越高，回答越有创造性；数值越低，回答越确定、保守。
`max_tokens`	数字	否	限制模型响应中的最大 token 数量，从而控制输出长度。
`timeout`	数字	否	等待模型响应的最长时间（单位：秒），超时后将取消请求。
`max_retries`	数字	否	当请求因网络超时、速率限制等问题失败时，系统最多重试的次数。

python 复制代码

# 执行模型调用 
# 调用模型的最直接方式是使用 invoke() 方法，传入单个消息或消息列表。
response = model.invoke("Why do parrots have colorful feathers?")
print(response.content)

python 复制代码

#  聊天模型可以接收消息列表来表示对话历史。每个消息都有一个角色，模型使用该角色来指示消息是由谁发送的。
#  角色可以是 "user"、"assistant" 或 "system"。
#  "user" 角色表示用户发送的消息。
#  "assistant" 角色表示助手（模型）发送的消息。
#  "system" 角色表示系统指令或配置消息。
conversation = [
    {"role": "system", "content": "You are a helpful assistant that translates English to chinese."},
    {"role": "user", "content": "Translate: I love programming."},
    {"role": "assistant", "content": "我爱编程."},
    {"role": "user", "content": "Translate: I love building applications."}
]
response = model.invoke(conversation)
print(response.content)

python 复制代码

# 同上，使用 invoke() 方法调用模型 message 对象实例
from langchain.messages import SystemMessage,HumanMessage,AIMessage
conversation = [
    SystemMessage('You are a helpful assistant that translates English to chinese.'),
    HumanMessage('I love programming.'),
    AIMessage('我爱编程.'),
    HumanMessage('I love building applications.'),
]
response = model.invoke(conversation)
print(response.content)

Stream 方法调用

python 复制代码

# 简单调用
for chunk in model.stream("Why do parrots have colorful feathers?"):
    print(chunk.text, end="|", flush=True)

python 复制代码

# 流式传输 Stream
# 根据不同的 block 类型处理输出，例如 reasoning、tool_call_chunk、text 等。
for chunk in model.stream("What color is the sky?"):
    for block in chunk.content_blocks:
        if block["type"] == "reasoning" and (reasoning := block.get("reasoning")):
            print(f"Reasoning: {reasoning}")
        elif block["type"] == "tool_call_chunk":
            print(f"Tool call chunk: {block}")
        elif block["type"] == "text":
            print(block["text"])
        else:
            print(f"Unknown block type: {block['type']}")

python 复制代码

# 相对于 invoke() 方法，stream() 方法返回多个 AIMessageChunk 对象，每个对象包含输出文本的一部分。
# 重要的是，流中的每个块都被设计为通过求和来合并成一个完整的消息。
full = None  # None | AIMessageChunk
for chunk in model.stream("What color is the sky?"):
    full = chunk if full is None else full + chunk
    print(full.text)

# The
# The sky
# The sky is
# The sky is typically
# The sky is typically blue
# ...

print(full.content_blocks)
# [{"type": "text", "text": "The sky is typically blue..."}]

Stream Event `astream_events`方法

异步流式传输语义事件。

astream_events 方法返回一个异步迭代器，该迭代器产生表示模型生成过程中发生的事件的字典。每个事件都包含事件类型（event）和相关数据（data）。

on_chat_model_start 事件

模型开始生成响应时触发。

事件数据（data）包含模型的配置和输入消息。

on_chat_model_stream 事件

模型生成响应的每个令牌时触发。

事件数据（data）包含生成的令牌。

on_chat_model_end 事件

模型完成生成响应时触发。

事件数据（data）包含完整的响应消息。

python 复制代码

async for event in model.astream_events("Hello"):

    if event["event"] == "on_chat_model_start":
        print(f"Input: {event['data']['input']}")

    elif event["event"] == "on_chat_model_stream":
        print(f"Token: {event['data']['chunk'].text}")

    elif event["event"] == "on_chat_model_end":
        print(f"Full message: {event['data']['output'].text}")

    else:
        pass

Batch

将一系列独立的请求批处理到一个模型中可以显著提高性能并降低成本，因为处理可以并行进行：

默认情况下，batch()只会返回整个批次的最终输出。如果你想在每个单独的输入生成完成后接收其输出，可以使用batch_as_completed()进行流式传输结果：

当使用batch_as_completed()时，结果可能会按顺序到达。每个结果都包含输入索引，用于按需要匹配并重建原始顺序。

当使用 batch() 或 batch_as_completed() 处理大量输入时，您可能希望控制并行调用的最大数量。这可以通过在 max_concurrency 字典中设置 RunnableConfig 属性来实现。

python 复制代码

responses = model.batch([
    "Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"
])
for response in responses:
    print(response)

python 复制代码

for response in model.batch_as_completed(
    ["Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"]
):
    print(response)

python 复制代码

model.batch(
    ["Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"],
    config={
        'max_concurrency': 5,  # Limit to 5 parallel calls
    }
)

工具调用

模型可以请求调用执行诸如从数据库获取数据、搜索网络或运行代码等任务的工具。工具是以下项目的配对：

一个模式Schema，包括工具名称、描述和/或论据定义（通常是JSON模式）
一个函数或协程执行。

这里是用户和模型之间的基本工具调用流程：

工具调用流程

=> : 工具调用流程

要使您定义的工具可供模型使用，您必须使用bind_tools将其绑定。在后续调用中，模型可以选择按需调用任何已绑定的工具。

一些模型提供商提供内置工具可以通过模型或调用参数启用（例如 ChatOpenAI，ChatAnthropic）。请参阅相应的提供者参考了解详细信息。

python 复制代码

# 绑定工具
from langchain.tools import tool

# 定义一个天气

@tool
def get_weather(location:str) -> str:
    """获取位置得天气"""
    return f"{location}天气是晴天"

model_with_tools = model.bind_tools([get_weather])

response = model_with_tools.invoke("北京的天气怎么样")
for tool_call in response.tool_calls:
    print("Tool:",tool_call['name'])
    print("Args: ", tool_call['args'])

# Output :
# Tool: get_weather
# Args: {'location': 'Boston'}

2026-01-15 00:58:00 结束时学习点

学习点完成

工具调用的常见方法

当绑定用户定义的工具时，模型的响应包括一个请求来执行一个工具。当单独使用模型与代理时，执行请求的工具并将结果返回给模型以用于后续推理是你的责任。当使用代理时，代理循环会为你处理工具执行循环。

工具执行循环

当模型返回工具调用时，你需要执行这些工具并将结果传递回模型。这创建了一个对话循环，模型可以使用工具结果来生成最终响应。LangChain 包含代理抽象，这些抽象可以为你处理这种编排。
LangChain代理
输入请求
调用模型
返回结果
是
工具结果
否
返回响应
用户
LangChain代理
LLM模型
需要工具调用?
执行工具
生成最终响应

每个ToolMessage由工具返回的条目包含一个tool_call_id，它与原始工具调用匹配，帮助模型将结果与请求相关联。

python 复制代码

from langchain.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get the current weather in a given location"""
    # 这里只是一个示例，实际中需要调用天气 API
    return f"天气信息: {location} 温度 72°F 天气 晴朗"

# 绑定(多个)工具到模型
model_with_tools = model.bind_tools([get_weather])
# Step 1: Model generates tool calls 工具调用
messages = [{"role": "user", "content": "What's the weather in Boston?"}]
ai_msg = model_with_tools.invoke(messages)
messages.append(ai_msg)
print("ai_msg:",ai_msg)
# Step 2: Execute tools and collect results
for tool_call in ai_msg.tool_calls:
    # Execute the tool with the generated arguments
    print("tool_call:",type(tool_call),tool_call)
    tool_result = get_weather.invoke(tool_call)
    messages.append(tool_result)

print("messages:",messages)
    
# Step 3: Pass results back to model for final response
final_response = model_with_tools.invoke(messages)
print(final_response.id,final_response.text)
# "The current weather in Boston is 72°F and sunny."

复制代码

ai_msg: content='' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 283, 'total_tokens': 304, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'Qwen/Qwen3-Coder-30B-A3B-Instruct', 'system_fingerprint': '', 'id': '019bc221824b0baf717d1419731c3ba5', 'finish_reason': 'tool_calls', 'logprobs': None} id='lc_run--019bc221-80e7-7ac3-a44c-4b68bb24b9cb-0' tool_calls=[{'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': '019bc22186d93f4567e03e8248861f42', 'type': 'tool_call'}] invalid_tool_calls=[] usage_metadata={'input_tokens': 283, 'output_tokens': 21, 'total_tokens': 304, 'input_token_details': {}, 'output_token_details': {'reasoning': 0}}
tool_call: <class 'dict'> {'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': '019bc22186d93f4567e03e8248861f42', 'type': 'tool_call'}
messages: [{'role': 'user', 'content': "What's the weather in Boston?"}, AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 283, 'total_tokens': 304, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'Qwen/Qwen3-Coder-30B-A3B-Instruct', 'system_fingerprint': '', 'id': '019bc221824b0baf717d1419731c3ba5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--019bc221-80e7-7ac3-a44c-4b68bb24b9cb-0', tool_calls=[{'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': '019bc22186d93f4567e03e8248861f42', 'type': 'tool_call'}], invalid_tool_calls=[], usage_metadata={'input_tokens': 283, 'output_tokens': 21, 'total_tokens': 304, 'input_token_details': {}, 'output_token_details': {'reasoning': 0}}), ToolMessage(content='天气信息: Boston 温度 72°F 天气 晴朗', name='get_weather', tool_call_id='019bc22186d93f4567e03e8248861f42')]
当前 Boston 的天气是晴朗，温度为 72°F。

强制工具调用

tools_choice 可以强制模型调用工具，而不是生成文本。

tools_choice="any" : 强制使用任何工具

model_with_tools = model.bind_tools([tool_1], tool_choice="any")

tools_choice="required" : 强制使用指定工具

model_with_tools = model.bind_tools([tool_1], tool_choice="tool_1")

并行调用多个工具

许多模型在适当的情况下支持并行调用多个工具。这使模型能够同时从不同来源收集信息。

大多数支持工具调用的模型默认启用并行工具调用。一些（包括OpenAI和Anthropic）允许你禁用此功能。要这样做，请设置parallel_tool_calls=False:

python 复制代码

model_with_tools = model.bind_tools([tool_1, tool_2], parallel_tool_calls=False)

python 复制代码

model_with_tools = model.bind_tools([get_weather])

response = model_with_tools.invoke(
    "What's the weather in Boston and Tokyo?"
)


# The model may generate multiple tool calls
# print(response.tool_calls)
# [
#   {'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': 'call_1'},
#   {'name': 'get_weather', 'args': {'location': 'Tokyo'}, 'id': 'call_2'},
# ]


# Execute all tools (can be done in parallel with async)
results = []
for tool_call in response.tool_calls:
    if tool_call['name'] == 'get_weather':
        result = get_weather.invoke(tool_call)
        print(result)
    results.append(result)
    
# [
#   {'location': 'Boston', 'temperature': 70, 'unit': 'Fahrenheit'},
#   {'location': 'Tokyo', 'temperature': 75, 'unit': 'Fahrenheit'},
# ]

复制代码

content='天气信息: Boston 温度 72°F 天气 晴朗' name='get_weather' tool_call_id='019bc22a2d57ea8f7540304353f6744e'
content='天气信息: Tokyo 温度 72°F 天气 晴朗' name='get_weather' tool_call_id='019bc22a2d57ea8f7540304353f6744f'

流媒体工具调用

:1: 在流式传输响应时，工具调用是通过ToolCallChunk逐步构建的。这使您能够在生成时看到工具调用，而不是等待完整的响应。

python 复制代码

for chunk in model_with_tools.stream("boston and new york's weather?"):
    # 工具调用块逐步到达
    for tool_chunk in chunk.tool_call_chunks:
        if name := tool_chunk.get("name"):
            print(f"工具调用: {name}")
        if id_ := tool_chunk.get("id"):
            print(f"ID: {id_}")
        if args := tool_chunk.get("args"):
            print(f"Args: {args}")
"""
Output:

ID: 019bc231e6eb70b682ef2a334f30e300
Args: {
Args: "location": "Boston"
Args: }
工具调用: get_weather
ID: 019bc231e8ce71df8ff8d05db628e71c
Args: {
Args: "location": "New York"
Args: }
"""

您可以累积块来构建完整的工具调用：

python 复制代码

gathered = None
for chunk in model_with_tools.stream("What's the weather in Boston?"):
    gathered = chunk if gathered is None else gathered + chunk
    print(gathered.tool_calls)

Output:

复制代码

[]
[{'name': 'get_weather', 'args': {}, 'id': '019bc2337189e90d6fd374b68a9618f5', 'type': 'tool_call'}]
[{'name': 'get_weather', 'args': {}, 'id': '019bc2337189e90d6fd374b68a9618f5', 'type': 'tool_call'}]
[{'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': '019bc2337189e90d6fd374b68a9618f5', 'type': 'tool_call'}]
[{'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': '019bc2337189e90d6fd374b68a9618f5', 'type': 'tool_call'}]
[{'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': '019bc2337189e90d6fd374b68a9618f5', 'type': 'tool_call'}]
[{'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': '019bc2337189e90d6fd374b68a9618f5', 'type': 'tool_call'}]

python 复制代码

gathered = None
for chunk in model_with_tools.stream("What's the weather in Boston?"):
    gathered = chunk if gathered is None else gathered + chunk
    print(gathered.tool_calls)

复制代码

[]
[{'name': 'get_weather', 'args': {}, 'id': '019bc2337189e90d6fd374b68a9618f5', 'type': 'tool_call'}]
[{'name': 'get_weather', 'args': {}, 'id': '019bc2337189e90d6fd374b68a9618f5', 'type': 'tool_call'}]
[{'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': '019bc2337189e90d6fd374b68a9618f5', 'type': 'tool_call'}]
[{'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': '019bc2337189e90d6fd374b68a9618f5', 'type': 'tool_call'}]
[{'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': '019bc2337189e90d6fd374b68a9618f5', 'type': 'tool_call'}]
[{'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': '019bc2337189e90d6fd374b68a9618f5', 'type': 'tool_call'}]

结构化输出

模型可以被要求以符合给定模式的格式提供其响应。这对于确保输出可以轻松解析并在后续处理中使用非常有用。LangChain 支持多种模式类型和方法来强制执行结构化输出。

Pydantic 模型: 提供最丰富的功能集，包括字段验证、描述和嵌套结构。
TypedDict: Python 的TypedDict 提供了 Pydantic 模型的更简单替代品，当您不需要运行时验证时非常理想。
JSON 模式: 是一种通用的模式格式，可用于定义 API 响应、配置文件和其他结构化数据。LangChain 支持使用 JSON 模式来强制模型输出符合特定结构。提供一个JSON Schema以获得最大的控制和互操作性。

结构化输出的关键考虑事项

方法参数：一些提供者支持不同的方法来生成结构化输出：
- 'json_schema'使用提供者提供的专用结构化输出功能。
- 'function_calling'：通过强制执行一个工具调用来生成结构化输出，该调用遵循给定的模式。
- 'json_mode'：某些提供者的一种先驱。'json_schema' 由一些提供者提供。生成有效的JSON，但必须在提示中描述其模式。
包含原始内容：设置include_raw=True以获取解析输出和原始AI消息。
验证：Pydantic 模型提供自动验证。TypedDict 和 JSON Schema 需要手动验证。
查看您的提供商集成页面以获取支持的方法和配置选项。

python 复制代码

# Pydantic 模型
from pydantic import BaseModel , Field
from langchain_01.models.scii_deepseekv3 import model

# 定义一个简单的 Pydantic 模型
class Movie(BaseModel):
    """A movie with details."""
    title: str = Field(description="The title of the movie")
    year: int = Field(description="The Release year of the movie")
    director: str = Field(description="The director of the movie")
    rating: float = Field(description="The rating of the movie")

model_with_padantic = model.with_structured_output(Movie)

prompt = [
    {"role":"user","content":"请推荐一部2023年的电影"}
]
response = model_with_padantic.invoke(prompt)
print(type(response),response,end="")

# <class '__main__.Movie'> title='长安三万里' year=2023 director='赵霁' rating=7.8

复制代码

<class '__main__.Movie'> title='长安三万里' year=2023 director='赵霁' rating=7.8

python 复制代码

# TypedDict 字典
from typing_extensions import TypedDict,Annotated

class MovieDict(TypedDict):
    """A movie with details."""
    title: Annotated[str,...,"The title of the movie"]
    year: Annotated[int,...,"The Release year of the movie"]
    director: Annotated[str,...,"The director of the movie"]
    rating: Annotated[float,...,"The rating of the movie"]

model_with_padantic = model.with_structured_output(MovieDict)

prompt = [
    {"role":"user","content":"请推荐一部2023年的电影"}
]
response = model_with_padantic.invoke(prompt)
print(type(response),response,end="")
# <class 'dict'> {'director': '克里斯托弗·诺兰', 'rating': 8.5, 'title': '奥本海默', 'year': 2023}

复制代码

<class 'dict'> {'director': '克里斯托弗·诺兰', 'rating': 8.5, 'title': '奥本海默', 'year': 2023}

python 复制代码

# Json scheme 模式 
import json
json_schema = {
    "title": "Movie",
    "description": "A movie with details",
    "type": "object",
    "properties": {
        "title": {
            "type": "string",
            "description": "The title of the movie"
        },
        "year": {
            "type": "integer",
            "description": "The year the movie was released"
        },
        "director": {
            "type": "string",
            "description": "The director of the movie"
        },
        "rating": {
            "type": "number",
            "description": "The movie's rating out of 10"
        }
    },
    "required": ["title", "year", "director", "rating"]
}

model_with_json_schema = model.with_structured_output(json_schema,method="json_schema")

prompt = [
    {"role":"user","content":"请推荐一部2023年的电影"}
]
response = model_with_json_schema.invoke(prompt)
print(type(response),response,end="")
# <class 'dict'> {'director': '克里斯托弗·诺兰', 'rating': 8.5, 'title': '奥本海默', 'year': 2023}

复制代码

<class 'dict'> {'director': '克里斯托弗·诺兰', 'rating': 8.5, 'title': '奥本海默', 'year': 2023}

python 复制代码

# 示例：消息输出与解析结构并列
"""
返回原始AIMessage对象和解析的表示可以很有用，以访问响应元数据，例如令牌计数。

要实现这一点，请在调用include_raw=True时设置with_structured_output：
"""

model_with_padantic = model.with_structured_output(Movie,include_raw=True)
prompt = [
    {"role":"user","content":"请推荐一部2023年的电影"}
]
response = model_with_padantic.invoke(prompt)
print(response,end="")
# {'raw': AIMessage(content='{\n  "director": "克里斯托弗·诺兰",\n  "rating": 9.0,\n  "title": "奥本海默",\n  "year": 2023\n}', additional_kwargs={'parsed': Movie(title='奥本海默', year=2023, director='克里斯托弗·诺兰', rating=9.0), 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 43, 'prompt_tokens': 17, 'total_tokens': 60, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'Qwen/Qwen3-Coder-30B-A3B-Instruct', 'system_fingerprint': '', 'id': '019bc242cec7d02123cda0670b0390d6', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--019bc242-cd63-7853-b095-f75415778d94-0', tool_calls=[], invalid_tool_calls=[], usage_metadata={'input_tokens': 17, 'output_tokens': 43, 'total_tokens': 60, 'input_token_details': {}, 'output_token_details': {}}), 'parsed': Movie(title='奥本海默', year=2023, director='克里斯托弗·诺兰', rating=9.0), 'parsing_error': None}

# {
#     "raw": AIMessage(...),
#     "parsed": Movie(title=..., year=..., ...),
#     "parsing_error": None,
# }

复制代码

{'raw': AIMessage(content='{\n  "director": "丹尼斯·维伦纽瓦",\n  "rating": 8.5,\n  "title": "沙丘2",\n  "year": 2023\n}', additional_kwargs={'parsed': Movie(title='沙丘2', year=2023, director='丹尼斯·维伦纽瓦', rating=8.5), 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 43, 'prompt_tokens': 17, 'total_tokens': 60, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'Qwen/Qwen3-Coder-30B-A3B-Instruct', 'system_fingerprint': '', 'id': '019bc245f39b0986a6e8e19a0a87b62b', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--019bc245-f17a-79d3-81ee-c153814d301c-0', tool_calls=[], invalid_tool_calls=[], usage_metadata={'input_tokens': 17, 'output_tokens': 43, 'total_tokens': 60, 'input_token_details': {}, 'output_token_details': {}}), 'parsed': Movie(title='沙丘2', year=2023, director='丹尼斯·维伦纽瓦', rating=8.5), 'parsing_error': None}

python 复制代码

# 示例：嵌套结构

class Actor(BaseModel):
    # 演员姓名
    name: str = Field(..., description="The name of the actor")
    # 演员角色
    role: str = Field(..., description="The role of the actor in the movie")

class MovieDetails(BaseModel):
    # 电影标题
    title: str = Field(..., description="The title of the movie")
    # 上映年份
    year: int = Field(..., description="The year the movie was released")
    # 演员列表
    cast: list[Actor] = Field(..., description="The list of actors in the movie")
    # 电影类型
    genres: list[str] = Field(..., description="The genres of the movie")
    # 预算，单位：百万美元
    budget: float | None = Field(None, description="Budget in millions USD")

model_with_structure = model.with_structured_output(MovieDetails)

prompt = [
    {"role":"user","content":"请推荐一部2023年的电影"}
]
response = model_with_structure.invoke(prompt)
print(type(response),response,end="")
# <class '__main__.MovieDetails'> title='明日边缘' year=2023 cast=[Actor(name='汤姆·汉克斯', role='主角'), Actor(name='斯嘉丽·约翰逊', role='女主角')] genres=['剧情', '科幻'] budget=100000000.0

复制代码

<class '__main__.MovieDetails'> title='明日边缘' year=2023 cast=[Actor(name='汤姆·汉克斯', role='主角'), Actor(name='斯嘉丽·约翰逊', role='女主角')] genres=['剧情', '科幻'] budget=100000000.0

python 复制代码

from typing_extensions import Annotated, TypedDict

class Actor(TypedDict):
    name: str
    role: str

class MovieDetails(TypedDict):
    title: str
    year: int
    cast: list[Actor]
    genres: list[str]
    budget: Annotated[float | None, ..., "Budget in millions USD"]

model_with_structure = model.with_structured_output(MovieDetails)

prompt = [
    {"role":"user","content":"请推荐一部2023年的电影"}
]
response = model_with_structure.invoke(prompt)
print(type(response),response,end="")
# <class 'dict'> {'budget': 100000000, 'cast': [{'name': '汤姆·汉克斯', 'role': '主角'}, {'name': '斯嘉丽·约翰逊', 'role': '女主角'}], 'genres': ['剧情', '科幻'], 'title': '明日边缘', 'year': 2023}

复制代码

<class 'dict'> {'budget': 100000000, 'cast': [{'name': '汤姆·汉克斯', 'role': '主角'}, {'name': '斯嘉丽·约翰逊', 'role': '女主角'}], 'genres': ['剧情', '科幻'], 'title': '明日边缘', 'year': 2023}

高级主题

1 model profile | 模型简介

LangChain聊天模型可以通过一个.profile属性暴露支持的功能和能力字典：

该profile也是常规dict，可以原地更新。如果模型实例是共享的，请考虑使用model_copy以避免更改共享状态。

模型配置文件数据如果缺失、过时或不正确，可以进行更改。 model = init_chat_model("...", profile=custom_profile)

模型简介是一个测试功能。简介的格式可能会有所改变。

python 复制代码

model.profile
# 该profile也是常规dict，可以原地更新。如果模型实例是共享的，请考虑使用model_copy以避免更改共享状态。
new_profile = model.profile | {"key": "value"}
model.model_copy(update={"profile": new_profile})
# ChatOpenAI(profile={'max_input_tokens': 100000, 'max_output_tokens': 12800, 'tool_calling': True, 'key': 'value'}, client=<openai.resources.chat.completions.completions.Completions object at 0x000002712AD8F140>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000002712B354320>, root_client=<openai.OpenAI object at 0x000002712B312AB0>, root_async_client=<openai.AsyncOpenAI object at 0x000002712B354CE0>, model_name='Qwen/Qwen3-Coder-30B-A3B-Instruct', temperature=0.5, model_kwargs={}, openai_api_key=SecretStr('**********'), openai_api_base='https://api.siliconflow.cn/v1', request_timeout=300.0, max_tokens=12800)

复制代码

ChatOpenAI(profile={'max_input_tokens': 100000, 'max_output_tokens': 12800, 'tool_calling': True, 'key': 'value'}, client=<openai.resources.chat.completions.completions.Completions object at 0x000002712AD8F140>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000002712B354320>, root_client=<openai.OpenAI object at 0x000002712B312AB0>, root_async_client=<openai.AsyncOpenAI object at 0x000002712B354CE0>, model_name='Qwen/Qwen3-Coder-30B-A3B-Instruct', temperature=0.5, model_kwargs={}, openai_api_key=SecretStr('**********'), openai_api_base='https://api.siliconflow.cn/v1', request_timeout=300.0, max_tokens=12800)

多模态模型

某些模型可以处理和返回非文本数据，例如图像、音频和视频。您可以通过提供内容块将非文本数据传递给模型。

推理

许多模型能够进行多步推理以得出结论。这涉及将复杂问题分解为更小、更易管理的步骤。

如果底层模型支持，您可以揭示这个推理过程，以更好地理解模型如何得出最终答案。

本地模型

LangChain 支持在您自己的硬件上本地运行模型。这对于数据隐私至关重要、您希望调用自定义模型或希望避免使用基于云的模型所产生费用的场景非常有用。

Ollama是运行聊天和嵌入模型的最简单方法之一。

提示缓存

许多提供商提供了及时缓存功能，以减少相同标记的重复处理的延迟和成本。这些功能可以是隐性的或显性的：

隐式提示缓存：如果请求命中缓存，提供者将自动节省成本。示例：OpenAI 和 Gemini。

显式缓存：提供商允许您手动指定缓存点以获得更大的控制权或确保节省成本。示例：

提示缓存通常只在输入令牌数量超过最低阈值时启用。请参阅提供者页面了解详情。

缓存的使用情况将会反映在响应的使用元数据中。

输入：

服务器端工具使用

一些提供商支持服务器端工具调用循环：模型可以与网页搜索、代码解释器和其他工具互动，并在一次对话中分析结果。

速率限制

许多聊天模型提供商对在给定时间内可以进行的调用次数施加限制。如果您遇到速率限制，您通常会从提供商那里收到一个速率限制错误响应，并且需要等待一段时间才能再次进行请求。

为了帮助管理速率限制，聊天模型集成接受一个rate_limiter参数，该参数可以在初始化时提供，以控制请求的发送速率。

提供的速率限制器只能限制每单位时间内请求的数量。如果您还需要根据请求的大小进行限制，这将没有帮助。

python 复制代码

from langchain_core.rate_limiters import InMemoryRateLimiter

rate_limiter = InMemoryRateLimiter(
    requests_per_second=0.1,  # 1 request every 10s
    check_every_n_seconds=0.1,  # Check every 100ms whether allowed to make a request
    max_bucket_size=10,  # Controls the maximum burst size.
)

model = init_chat_model(
    model="gpt-5",
    model_provider="openai",
    rate_limiter=rate_limiter  
)

基础网址或代理

对于许多聊天模型集成，您可以配置 API 请求的 base URL，这允许您使用具有 OpenAI 兼容 API 的模型提供者或使用代理服务器。

许多模型提供商提供与OpenAI兼容的API（例如，Together AI，vLLM）。您可以通过指定适当的init_chat_model参数来使用base_url

代理配置

对于需要HTTP代理的部署，一些模型集成支持代理配置：

python 复制代码

from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-4o",
    openai_proxy="http://proxy.example.com:8080"
)

Token usage

许多模型提供商在调用响应中返回令牌使用信息。当可用时，此信息将包含在AIMessage相应模型生成的对象中。有关详细信息，请参阅消息指南。

一些提供程序 API，特别是 OpenAI 和 Azure OpenAI 聊天完成，要求用户在流上下文中选择接收标记使用数据。

您可以通过回调或上下文管理器在应用程序中跟踪模型的累积Token计数，如下所示：

python 复制代码

# 回调获取token usage
from langchain_core.callbacks import UsageMetadataCallbackHandler

callback = UsageMetadataCallbackHandler()
result_1 = model.invoke("Hello", config={"callbacks": [callback]})
callback.usage_metadata

复制代码

{'Qwen/Qwen3-Coder-30B-A3B-Instruct': {'input_tokens': 9,
  'output_tokens': 9,
  'total_tokens': 18,
  'input_token_details': {},
  'output_token_details': {}}}

python 复制代码

# 上下文管理器获取
from langchain_core.callbacks import get_usage_metadata_callback


with get_usage_metadata_callback() as cb:
    print(model.invoke("Hello"))
    print(model.invoke("你是谁"))
    print(cb.usage_metadata) # 两个请求总和

复制代码

content='Hello! How can I help you today?' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 9, 'total_tokens': 18, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'Qwen/Qwen3-Coder-30B-A3B-Instruct', 'system_fingerprint': '', 'id': '019bc2696d76bc9cacffa1e03f716ffc', 'finish_reason': 'stop', 'logprobs': None} id='lc_run--019bc269-6b62-7391-bacf-23282af73dfd-0' tool_calls=[] invalid_tool_calls=[] usage_metadata={'input_tokens': 9, 'output_tokens': 9, 'total_tokens': 18, 'input_token_details': {}, 'output_token_details': {}}
content='我是通义千问，阿里巴巴集团旗下的通义实验室自主研发的超大规模语言模型。我能够回答问题、创作文字，如写故事、写公文、写邮件、写剧本、逻辑推理、编程等，还能表达观点，玩游戏等。如果你有任何问题或需要帮助，欢迎随时告诉我！' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 65, 'prompt_tokens': 10, 'total_tokens': 75, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'Qwen/Qwen3-Coder-30B-A3B-Instruct', 'system_fingerprint': '', 'id': '019bc269701a3bc24e90cd278fb13015', 'finish_reason': 'stop', 'logprobs': None} id='lc_run--019bc269-6ebd-7930-b14c-26aec3f2ff17-0' tool_calls=[] invalid_tool_calls=[] usage_metadata={'input_tokens': 10, 'output_tokens': 65, 'total_tokens': 75, 'input_token_details': {}, 'output_token_details': {}}
{'Qwen/Qwen3-Coder-30B-A3B-Instruct': {'output_tokens': 74, 'total_tokens': 93, 'input_tokens': 19, 'output_token_details': {}, 'input_token_details': {}}}

调用配置

调用模型时，可以通过config参数使用·RunnableConfig字典传递额外的配置。这提供了在运行时对执行行为、回调和元数据跟踪的控制。

python 复制代码

# 常见的配置选项包括：
from langchain_core.callbacks import UsageMetadataCallbackHandler
callback = UsageMetadataCallbackHandler()
response = model.invoke(
    "Tell me a joke",
    config={
        "run_name": "joke_generation",      # Custom name for this run
        "tags": ["humor", "demo"],          # Tags for categorization
        "metadata": {"user_id": "123"},     # Custom metadata
        "callbacks": [callback], # Callback handlers
    }
)
print(response)
print(callback.usage_metadata)

复制代码

content="Why don't scientists trust atoms?\n\nBecause they make up everything!" additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 12, 'total_tokens': 25, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'Qwen/Qwen3-Coder-30B-A3B-Instruct', 'system_fingerprint': '', 'id': '019bc26d3bf10d14650ba43517519688', 'finish_reason': 'stop', 'logprobs': None} id='lc_run--019bc26d-3a18-74b0-a380-b1d33afb4052-0' tool_calls=[] invalid_tool_calls=[] usage_metadata={'input_tokens': 12, 'output_tokens': 13, 'total_tokens': 25, 'input_token_details': {}, 'output_token_details': {}}
{'Qwen/Qwen3-Coder-30B-A3B-Instruct': {'input_tokens': 12, 'output_tokens': 13, 'total_tokens': 25, 'input_token_details': {}, 'output_token_details': {}}}

可配置模型

您还可以通过指定 configurable_fields 来创建一个运行时可配置的模型。如果您没有指定模型值，那么 'model' 和 'model_provider' 将默认可配置。

可配置模型，带有默认值

我们可以创建一个具有默认模型值的可配置模型，指定哪些参数是可配置的，并为可配置参数添加前缀：

python 复制代码

from langchain.chat_models import init_chat_model

first_model = init_chat_model(
    # model="deepseek-ai/DeepSeek-V3.2",
    # 代码模型 MiniMaxAI/MiniMax-M2  192K 上下文窗口 8元 /百万
    # Qwen3-Coder-30B-A3B-Instruct 30B 256K 上下文窗口 2.8元 /百万
    # model="MiniMaxAI/MiniMax-M2",
    model="Qwen/Qwen3-Coder-30B-A3B-Instruct",
    # model_provider 必须指定为 openai，否则会报错
    # 基于 openai 兼容 模型的参数
    model_provider="openai",
    # 模型基础 URL
    base_url=config.SCII_API_BASE_URL,
    # API 密钥
    api_key=config.SCII_API_KEY,
    # 温度 0.5 是默认值，范围 0-1，值越大，输出越随机
    temperature=0.5,
    # 超时时间 5 分钟
    timeout=60*5,
    # 最大输出 token 数 16000 是默认值，根据模型不同，可能会有差异
    max_tokens=12800,
    profile=custom_profile,
# 可配置字段，用于动态配置模型参数
configurable_fields=("model", "model_provider", "temperature", "max_tokens"),
# 配置前缀，用于在配置中指定模型参数
config_prefix="first",  # Useful when you have a chain with multiple models

)
#  动态配置模型 ，指定模型为 deepseek-ai/DeepSeek-V3.2，温度为 0.5，最大输出 token 数为 100
first_model.invoke(
    "what's your name",
    config={
        "configurable": {
            "first_model": "deepseek-ai/DeepSeek-V3.2",
            "first_temperature": 0.5,
            "first_max_tokens": 100,
        }
    },
)
#AIMessage(content="I'm DeepSeek, an AI assistant created by DeepSeek Company! I'm here to help you with a wide variety of questions and tasks. 😊\n\nI'm a text-based model, but I can process uploaded files like images, PDFs, Word documents, and more to read and analyze the text content. I'm completely free to use and don't have any voice features - just pure text-based assistance!\n\nIs there anything specific I can help you with today?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 97, 'prompt_tokens': 8, 'total_tokens': 105, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'deepseek-ai/DeepSeek-V3.2', 'system_fingerprint': '', 'id': '019bc276cc42d4176679721e9f178176', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--019bc275-08ac-7080-9714-1b52fe6b2a9f-0', tool_calls=[], invalid_tool_calls=[], usage_metadata={'input_tokens': 8, 'output_tokens': 97, 'total_tokens': 105, 'input_token_details': {}, 'output_token_details': {'reasoning': 0}})

复制代码

AIMessage(content="I'm DeepSeek, an AI assistant created by DeepSeek Company! I'm here to help you with a wide variety of questions and tasks. 😊\n\nI'm a text-based model, but I can process uploaded files like images, PDFs, Word documents, and more to read and analyze the text content. I'm completely free to use and don't have any voice features - just pure text-based assistance!\n\nIs there anything specific I can help you with today?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 97, 'prompt_tokens': 8, 'total_tokens': 105, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'deepseek-ai/DeepSeek-V3.2', 'system_fingerprint': '', 'id': '019bc276cc42d4176679721e9f178176', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--019bc275-08ac-7080-9714-1b52fe6b2a9f-0', tool_calls=[], invalid_tool_calls=[], usage_metadata={'input_tokens': 8, 'output_tokens': 97, 'total_tokens': 105, 'input_token_details': {}, 'output_token_details': {'reasoning': 0}})

使用声明式可配置模型

我们可以在可配置模型上调用声明性操作·，如bind_tools，with_structured_output，with_configurable等，并以与我们通常调用定期实例化的聊天模型对象相同的方式链接可配置模型。

python 复制代码

from pydantic import BaseModel, Field


class GetWeather(BaseModel):
    """Get the current weather in a given location"""
    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


class GetPopulation(BaseModel):
    """Get the current population in a given location"""
    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


model = init_chat_model(temperature=0,
 # model_provider 必须指定为 openai，否则会报错
    # 基于 openai 兼容 模型的参数
    model_provider="openai",
    # 模型基础 URL
    base_url=config.SCII_API_BASE_URL,
    # API 密钥
    api_key=config.SCII_API_KEY,)
model_with_tools = model.bind_tools([GetWeather, GetPopulation])

model_with_tools.invoke(
    "what's bigger in 2024 LA or NYC", config={"configurable": {"model": "deepseek-ai/DeepSeek-V3.2"}}
).tool_calls

复制代码

[{'name': 'GetPopulation',
  'args': {'location': 'Los Angeles, CA'},
  'id': '019bc2798c081643928c8a55d25a3c95',
  'type': 'tool_call'}]

Langchain_v1.0|核心模块-模型Model