LangChain1.0智能体开发:安全防护机制

防护机制是所有基于大模型的部署方案中的关键组件。设计完善的防护机制,为你的智能体实现安全检查与内容过滤功能,有助于管理数据隐私风险和声誉风险。阅读本文您将获得;

  • 安全防护机制的作用
  • 安全防护机制的实现方式
  • LangChain内置安全防护机制
  • LangChain自定义安全防护机制
  • 构建多层安全防护机制

1、安全防护机制

安全防护机制(Guardrails)通过在智能体执行流程的关键节点对内容进行验证和过滤,帮助你构建安全、合规的人工智能应用。它能够检测敏感信息、执行内容政策、验证输出内容,并在不安全行为引发问题前阻止其发生。

常见使用场景包括:

  • 防止个人身份信息(PII)泄露
  • 检测并拦截提示词注入攻击
  • 屏蔽不当或有害内容
  • 执行业务规则与合规要求
  • 验证输出内容的质量与准确性

你可以通过中间件实现安全防护机制,在关键节点拦截执行流程 ------ 例如智能体启动前、执行完成后,或围绕模型调用与工具调用的全程。

2、实现方式

安全防护机制(Guardrails)可通过两种互补的方式实现:

  • 确定性安全防护机制(Deterministic guardrails):<br> 采用基于规则的逻辑,例如正则表达式(regex)模式、关键词匹配或显式检查。该方式速度快、结果可预测且成本低,但可能会遗漏一些细微的违规情况。
  • 基于模型的安全防护机制(Model-based guardrails):<br> 使用大语言模型(LLMs)或分类器,通过语义理解来评估内容。能够识别规则机制遗漏的细微问题,但速度较慢且成本更高。

LangChain 既提供内置安全防护机制(例如个人身份信息(PII)检测、人机协同(human-in-the-loop)),也提供灵活的中间件系统,可通过上述任意一种方式(确定性或基于模型)构建自定义安全防护机制。

3、内置安全防护机制

3.1 个人身份信息(PII)检测

LangChain提供内置中间件,用于检测对话中的个人身份信息(Personally Identifiable Information, PII)并对其进行处理。该中间件可检测多种常见的PII类型,例如电子邮件、信用卡号、IP 地址等。 在以下场景中,PII检测中间件能发挥重要作用:需满足合规要求的医疗和金融类应用、需要对日志进行脱敏处理的客服智能体,以及所有涉及敏感用户数据处理的应用。 PII中间件支持多种策略来处理检测到的个人身份信息,具体包括:

策略(Strategy) 说明(Description) 示例(Example)
脱敏(redact) 替换为 [REDACTED_TYPE] 格式 [REDACTED_EMAIL](电子邮件脱敏)
掩码(mask) 部分隐藏(如只显示最后 4 位) --****-1234(信用卡号掩码)
哈希(hash) 替换为确定性哈希值 a8f5f167...(哈希后的值)
拦截(block) 检测到信息时抛出异常 触发错误(Error thrown)

示例:

python 复制代码
from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain.tools import tool
from langchain.agents.middleware import PIIMiddleware
from config import api_key, api_base


def init_model():
    model = init_chat_model(
        api_key = api_key,
        base_url = api_base,
        model = "Qwen/Qwen3-8B",
        model_provider = "openai",
        temperature = 0.7,
    )
    return model

@tool
def get_user_order_progress(name: str) -> str:
    """Get user order progress for a given name."""
    return f"{name}, your order progress is 50% at present."

agent = create_agent(
    model=init_model(),
    tools=[get_user_order_progress], 
    middleware=[# Redact emails in user input before sending to model
        PIIMiddleware("email",strategy="redact",apply_to_input=True),
        # Mask credit cards in user input
        PIIMiddleware("credit_card",detector=r"\d{4}-\d{4}-\d{4}-\d{4}",strategy="mask",apply_to_input=True),
        # Block API keys - raise error if detected
        PIIMiddleware("api_key",detector=r"sk-[a-zA-Z0-9]{32}",strategy="block",apply_to_input=True),
        ],
)

# When user provides PII, it will be handled according to the strategy
input = {"messages": [{"role": "user", "content": "My name is Bob. My email is john.doe@example.com and card is 4532-1234-5678-9010. I wangt to know my order progress."}]}

for chunk in agent.stream(input, stream_mode="values"):
    chunk['messages'][-1].pretty_print()

内置PII类型:

  • email - 电子邮件地址
  • credit_card - 信用卡号(经卢恩算法验证,Luhn validated)
  • ip - IP 地址
  • mac_address - MAC 地址
  • url - 统一资源定位符(URL)

配置选项:

参数(Parameter) 说明(Description) 默认值(Default)
pii_type 待检测的 PII 类型(内置类型或自定义类型) 必选(Required)
strategy 检测到 PII 后的处理方式("block" 拦截、"redact" 脱敏、"mask" 掩码、"hash" 哈希) "redact"(脱敏)
detector 自定义检测函数或正则表达式(regex)模式 None(使用内置检测逻辑)
apply_to_input 在调用模型前检查用户消息 True(启用)
apply_to_output 在调用模型后检查 AI 消息 False(禁用)
apply_to_tool_results 在工具执行后检查工具返回的结果消息 False(禁用)

3.2 人机协同(Human-in-the-loop)

LangChain提供内置的人机协同中间件HumanInTheLoopMiddleware,可在执行敏感操作前要求获得人工审批。对于高风险决策场景,这是最有效的安全防护机制之一。 在以下场景中,人机协同中间件能发挥重要作用:金融交易与转账、删除或修改生产数据、向外部人员发送信息,以及所有会对业务产生重大影响的操作。

python 复制代码
from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain.tools import tool
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.types import Command
from config import api_key, api_base


def init_model():
    model = init_chat_model(
        api_key = api_key,
        base_url = api_base,
        model = "Qwen/Qwen3-8B",
        model_provider = "openai",
        temperature = 0.7,
    )
    return model

@tool
def send_email(recipient: str, content: str) -> str:
    """Send an email to the specified recipient."""
    return f"Email sent to {recipient} with content: {content}"

agent = create_agent(
            model=init_model(),
            tools=[send_email], 
            middleware=[HumanInTheLoopMiddleware(
                interrupt_on={
                    # Require approval for sensitive operations
                    "send_email": True,
                    "delete_database": True,
                    # Auto-approve safe operations
                    "search": False,
                })
                ],
            # Persist the state across interrupts
            checkpointer=InMemorySaver(),
)

# Human-in-the-loop requires a thread ID for persistence
config = {"configurable": {"thread_id": "001"}}

# Agent will pause and wait for approval before executing sensitive tools
result = agent.invoke(
    {"messages": [{"role": "user", "content": "Send an email to the team, tell them we will be back in 2 hours"}]},
    config=config
)
state = agent.get_state(config)
mark=len(state.values["messages"])
for item in state.values["messages"]:
    item.pretty_print()
print("*" * 30 + "人工审批" + "*" * 40 )
result = agent.invoke(
    Command(resume={"decisions": [{"type": "approve"}]}),
    config=config  # Same thread ID to resume the paused conversation
)
state = agent.get_state(config)
for item in state.values["messages"][mark:]:
    item.pretty_print()

4、自定义安全防护

对于更复杂的安全防护机制(Guardrails),你可以创建在智能体执行前或执行后运行的自定义中间件。这能让你完全掌控验证逻辑、内容过滤和安全检查的全过程。

4.1 智能体执行前的安全防护机制

使用 "智能体执行前"(before_agent)钩子,在每次调用启动时对请求进行一次性验证。这适用于会话级别的检查,例如身份验证、速率限制,或在任何处理开始前拦截不当请求。 内容过滤中间件,一个确定性安全防护机制(Deterministic Guardrail)的示例:

python 复制代码
from typing import Any
from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain.tools import tool
from langchain.agents.middleware import AgentMiddleware, AgentState, hook_config
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.runtime import Runtime
from config import api_key, api_base


def init_model():
    model = init_chat_model(
        api_key = api_key,
        base_url = api_base,
        model = "Qwen/Qwen3-8B",
        model_provider = "openai",
        temperature = 0.7,
    )
    return model

@tool
def search_tool(query: str) -> str:
    """Search for information on the internet."""
    return f"Search results for {query}: 1. Result1; 2. Result2; 3. Result3."

class ContentFilterMiddleware(AgentMiddleware):
    """Deterministic guardrail: Block requests containing banned keywords."""
    def __init__(self, banned_keywords: list[str]):
        super().__init__()
        self.banned_keywords = [kw.lower() for kw in banned_keywords]

    @hook_config(can_jump_to=["end"])
    def before_agent(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        # Get the first user message
        if not state["messages"]:
            return None

        latest_message = state["messages"][-1]
        if latest_message.type != "human":
            return None

        content = latest_message.content.lower()

        # Check for banned keywords
        for keyword in self.banned_keywords:
            if keyword in content:
                # Block execution before any processing
                return {
                    "messages": [{
                        "role": "assistant",
                        "content": "I cannot process requests containing inappropriate content. Please rephrase your request."
                    }],
                    "jump_to": "end"
                }

        return None

agent = create_agent(
    model=init_model(),
    tools=[search_tool], 
    system_prompt="You are a IT expert. Your task is to answer user's IT questions.",
    middleware=[ContentFilterMiddleware(banned_keywords=["hack", "exploit", "malware"])],
    checkpointer=InMemorySaver(),
)

config = {"configurable": {"thread_id": "001"}}
# When user provides PII, it will be handled according to the strategy
input = {"messages": [{"role": "user", "content": "How do I hack into a database?"}]}
for chunk in agent.stream(input, config, stream_mode="values"):
    chunk['messages'][-1].pretty_print()

input2 = {"messages": [{"role": "user", "content": "Explain the concept of AI agent in 100 words."}]}
for chunk in agent.stream(input2, config, stream_mode="values"):
    chunk['messages'][-1].pretty_print()

inpu3 = {"messages": [{"role": "user", "content": "Make a malware which can login a private database."}]}
for chunk in agent.stream(inpu3, config, stream_mode="values"):
    chunk['messages'][-1].pretty_print()

4.2 智能体执行后的安全防护机制

使用 "智能体执行后"(after_agent)钩子,在返回给用户之前对最终输出进行一次性验证。这适用于基于模型的安全检查、质量验证,或对智能体的完整响应进行最终合规性扫描。 安全评估中间件,一个基于模型的安全防护机制示例:

python 复制代码
from langchain.agents.middleware import AgentMiddleware, AgentState, hook_config
from langchain.agents import create_agent
from langchain_core.messages import AIMessage
from langchain.chat_models import init_chat_model
from langchain.tools import tool
from langgraph.runtime import Runtime
from langgraph.checkpoint.memory import InMemorySaver
from typing import Any
from config import api_key, api_base

def init_model():
    model = init_chat_model(
        api_key = api_key,
        base_url = api_base,
        model = "Qwen/Qwen3-8B",
        model_provider = "openai",
        temperature = 0.7,
    )
    return model
model = init_model()

class SafetyGuardrailMiddleware(AgentMiddleware):
    """Model-based guardrail: Use an LLM to evaluate response safety."""

    def __init__(self):
        super().__init__()
        self.safety_model = model

    @hook_config(can_jump_to=["end"])
    def after_agent(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        # Get the final AI response
        if not state["messages"]:
            return None

        last_message = state["messages"][-1]
        if not isinstance(last_message, AIMessage):
            return None

        # Use a model to evaluate safety
        safety_prompt = f"""Evaluate if this response is safe and appropriate.
        Respond with only 'SAFE' or 'UNSAFE'.

        Response: {last_message.content}"""

        result = self.safety_model.invoke([{"role": "user", "content": safety_prompt}])

        if "UNSAFE" in result.content:
            return {
                "messages": [{
                    "role": "assistant",
                    "content": "I cannot provide that response. Please rephrase your request."
                }],
                "jump_to": "end"
            }

        return None

@tool
def search_tool(query: str) -> str:
    """Search for information on the internet."""
    return f"Search results for {query}: 1. Result1; 2. Result2; 3. Result3."

agent = create_agent(
    model=init_model(),
    tools=[search_tool], 
    system_prompt="You are a knowledgeable consultant. Your task is to answer user's questions professionally.",
    middleware=[SafetyGuardrailMiddleware()],
    checkpointer=InMemorySaver(),
)

config = {"configurable": {"thread_id": "001"}}
# When user provides PII, it will be handled according to the strategy
input = {"messages": [{"role": "user", "content": "How do I make explosives?"}]}
for chunk in agent.stream(input, config, stream_mode="values"):
    chunk['messages'][-1].pretty_print()

input2 = {"messages": [{"role": "user", "content": "What's the best way to learn machine learning?"}]}
for chunk in agent.stream(input2, config, stream_mode="values"):
    chunk['messages'][-1].pretty_print()

5、多层安全防护

你可以通过将多个安全防护机制添加到中间件数组中来实现堆叠使用。它们会按顺序执行,使你能够构建分层防护。 示例:

python 复制代码
from langchain.agents import create_agent
from langchain.agents.middleware import PIIMiddleware, HumanInTheLoopMiddleware

agent = create_agent(
    model="gpt-4o",
    tools=[search_tool, send_email_tool],
    middleware=[
        # Layer 1: Deterministic input filter (before agent)
        ContentFilterMiddleware(banned_keywords=["hack", "exploit"]),

        # Layer 2: PII protection (before and after model)
        PIIMiddleware("email", strategy="redact", apply_to_input=True),
        PIIMiddleware("email", strategy="redact", apply_to_output=True),

        # Layer 3: Human approval for sensitive tools
        HumanInTheLoopMiddleware(interrupt_on={"send_email": True}),

        # Layer 4: Model-based safety check (after agent)
        SafetyGuardrailMiddleware(),
    ],
)
相关推荐
CV炼丹术2 小时前
AAAI 2025 | 川大提出Mesorch:CNN与Transformer并行架构,革新图像篡改检测!
图像处理·人工智能·cnn·transformer
机器之心2 小时前
突破LLM遗忘瓶颈,谷歌「嵌套学习」让AI像人脑一样持续进化
人工智能·openai
Juchecar2 小时前
利用AI辅助"代码考古“操作指引
人工智能·ai编程
Juchecar2 小时前
AI时代,如何在人机协作中保持代码的清晰性与一致性
人工智能·ai编程
掘金安东尼2 小时前
被权重出卖的“脏数据”:GPT-oss 揭开的 OpenAI 中文训练真相
人工智能
Orange_sparkle3 小时前
关于dify中http节点下载文件时,文件名不为原始文件名问题解决
人工智能·http·chatgpt·dify
王哈哈^_^3 小时前
【完整源码+数据集】蓝莓数据集,yolo11蓝莓成熟度检测数据集 3023 张,蓝莓成熟度数据集,目标检测蓝莓识别算法系统实战教程
人工智能·算法·yolo·目标检测·计算机视觉·ai·视觉检测
盘古开天16663 小时前
通俗易懂:YOLO模型原理详解,从零开始理解目标检测
人工智能·yolo·目标检测
OpenBuild.xyz3 小时前
x402 生态系统:Web3 与 AI 融合的支付新基建
人工智能·web3