零基础5分钟上手亚马逊云科技 - AI模型内容安全过滤

在上一篇文章中，小李哥带大家深入调研亚马逊云科技AI模型平台Amazon Bedrock热门开发功能，了解了模型平台的文字/图片生成、模型表现评估和模型内容安全审核的实践操作。这次我们将继续介绍如何利用API的形式，利用Python代码的形式对AI模型内容安全过滤，阻止输入、输出中有危害的内容，提升模型在用户使用过程中的安全性。

方案所需基础知识

什么是Amazon Bedrock Guardrails？

Amazon Bedrock Guardrails是一个用于构建和部署生成式AI应用的工具和服务。它可以帮助开发人员在使用Amazon Bedrock的基础上创建生成内容时，确保模型的输出符合预期，并避免出现不安全或不适当的内容。这些工具为开发人员提供了一个安全、可控的环境来管理生成内容，从而使开发者们能够更安全、合规地使用生成式AI技术。

Guardrails不仅仅是一组简单的规则或限制，而是一个动态的框架，内置丰富的模型安全内容规律规则，允许开发人员根据特定的应用需求和行业标准来定义和执行合规要求。通过与Amazon Bedrock的集成，开发人员可以轻松地将这些安全措施应用到模型的训练、部署到推理的整个生命周期中。

为什么使用Amazon Bedrock Guardrails对模型内容进行安全过滤？

使用Amazon Bedrock Guardrails进行模型内容的安全过滤在为用户提供AI应用服务时非常重要，因为它可以帮助开发者确保生成内容的质量和安全性。在生成式AI应用中，模型有时可能会生成不准确、偏见或不适当的幻觉内容。通过使用Guardrails，开发者可以实施必要的安全过滤措施，减少这些风险。

Guardrails提供了实时监控和过滤功能，使得开发人员可以在内容生成的过程中自动检测并阻止潜在的有害输出。这种预防措施不仅保证了开发者项目质量，还帮助开发者可以遵守法律法规和行业标准，避免潜在的法律责任。

本实践包括的内容

1. 本实践中大家将会学习如何使用亚马逊云科技命令行工具AWS CLI，调用Amazon Bedrock API获取创建的模型防护链Guardrails信息

2.通过Python代码，调用亚马逊云科技Boto3 SDK API分别对模型输入和输出内容进行安全性审核和规律

功能实践具体步骤

Boto3 SDK中的ApplyGuardrail API 可用于评估所有 FM（包括 Amazon Bedrock 上的 FM、自定义和第三方 FM）的输入提示和模型响应，从而实现对所有生成式 AI 应用程序的集中治理。下面我们通过Python 代码的方式来展示这个功能。

首先安装Python Boto3 SDK

python 复制代码

#升级 boto3 到最新版本
!pip install --user -U boto3

通过AWS CLI获取我们在上一篇文章中创建的名字为"Demo"的安全防护栏Guardrails的ID

bash 复制代码

#使用 aws cli 命令获取刚才创建的名称为 "demo" 的 Amazon Bedrock围栏
!aws bedrock list-guardrails --query 'guardrails[?contains(name, `demo`)]'

运行命令后得到回复，可以得到ID为："vewdslvdea87"

XML 复制代码

[
    {
        "id": "vewdslvdea87",
        "arn": "arn:aws:bedrock:us-east-1:[account-id-masked]:guardrail/vewdslvdea87",
        "status": "READY",
        "name": "demo",
        "version": "DRAFT",
        "createdAt": "2024-07-22T03:55:26Z",
        "updatedAt": "2024-07-22T03:55:52.017301903Z"
    }
]

通过AWS CLI获取该防护栏"vewdslvdea87"所有的过去历史版本

bash 复制代码

#使用 aws cli 命令获取刚才创建的名称为 "demo" 的 Amazon Bedrock围栏的所有版本
!aws bedrock list-guardrails --guardrail-identifier "vewdslvdea87"

得到过去历史的两个版本以及创建时间。

XML 复制代码

{
    "guardrails": [
        {
            "id": "vewdslvdea87",
            "arn": "arn:aws:bedrock:us-east-1:[account-id-masked]:guardrail/vewdslvdea87",
            "status": "READY",
            "name": "demo",
            "version": "DRAFT",
            "createdAt": "2024-07-22T03:55:26Z",
            "updatedAt": "2024-07-22T03:55:52.017301903Z"
        },
        {
            "id": "vewdslvdea87",
            "arn": "arn:aws:bedrock:us-east-1:[account-id-masked]:guardrail/vewdslvdea87",
            "status": "READY",
            "name": "demo",
            "description": "version 1",
            "version": "1",
            "createdAt": "2024-07-22T03:55:51Z",
            "updatedAt": "2024-07-22T03:55:52.017345499Z"
        }
    ]
}

通过AWS CLI获取该防护栏具体的配置信息。在该配置中明确定义了大模型拒绝回复的话题，表示在输入或输出内容中出现该话题时，大模型将拒绝回复。也明确定义了在大模型敏感信息规则，当输入或输出内容中出现人名、车牌时，大模型将拒绝回复。

XML 复制代码

{
    "name": "demo",
    "description": "version 1",
    "guardrailId": "vewdslvdea87",
    "guardrailArn": "arn:aws:bedrock:us-east-1:[account-id-masked]:guardrail/vewdslvdea87",
    "version": "1",
    "status": "READY",
    "topicPolicy": {
        "topics": [
            {
                "name": "FSI",
                "definition": "Investment, buying stock / gold / futures / cryptocurrency etc",
                "examples": [],
                "type": "DENY"
            }
        ]
    },
    "contentPolicy": {
        "filters": [
            {
                "type": "SEXUAL",
                "inputStrength": "HIGH",
                "outputStrength": "HIGH"
            },
            {
                "type": "HATE",
                "inputStrength": "HIGH",
                "outputStrength": "HIGH"
            },
            {
                "type": "VIOLENCE",
                "inputStrength": "HIGH",
                "outputStrength": "HIGH"
            },
            {
                "type": "INSULTS",
                "inputStrength": "HIGH",
                "outputStrength": "HIGH"
            },
            {
                "type": "MISCONDUCT",
                "inputStrength": "HIGH",
                "outputStrength": "HIGH"
            },
            {
                "type": "PROMPT_ATTACK",
                "inputStrength": "HIGH",
                "outputStrength": "NONE"
            }
        ]
    },
    "wordPolicy": {
        "words": [
            {
                "text": "投资"
            },
            {
                "text": "虚拟货币"
            },
            {
                "text": "bitcoin"
            }
        ],
        "managedWordLists": [
            {
                "type": "PROFANITY"
            }
        ]
    },
    "sensitiveInformationPolicy": {
        "piiEntities": [
            {
                "type": "NAME",
                "action": "BLOCK"
            },
            {
                "type": "VEHICLE_IDENTIFICATION_NUMBER",
                "action": "BLOCK"
            }
        ],
        "regexes": []
    },
    "contextualGroundingPolicy": {
        "filters": [
            {
                "type": "GROUNDING",
                "threshold": 0.6
            },
            {
                "type": "RELEVANCE",
                "threshold": 0.7
            }
        ]
    },
    "createdAt": "2024-07-22T03:55:51Z",
    "updatedAt": "2024-07-22T03:55:52.017345499Z",
    "statusReasons": [],
    "failureRecommendations": [],
    "blockedInputMessaging": "Sorry, the model cannot answer this question.",
    "blockedOutputsMessaging": "Sorry, the model cannot answer this question."
}

接下来进入正题，我们利用的自定义封装函数apply，调应Boto3 SDK提供 Amazon Bedrock Guardrail 独立评估API ApplyGuardrai 做文本审核

python 复制代码

import boto3

#创建 AWS Bedrock Runtime 客户端对象,指定区域为 us-east-1
bedrockRuntimeClient = boto3.client('bedrock-runtime', region_name="us-east-1")

#封装函数，调用 Amazon Bedrock Guardrail 独立评估API ApplyGuardrai 做文本审核
def apply(guardrail_id, guardrail_version,source,text):

    response = bedrockRuntimeClient.apply_guardrail(
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version, 
        source=source, 
        content=[{"text": {"text": text}}]
    )
    
    print(response["action"])
                                                                                                                                                        
    if "outputs" in response and len(response["outputs"]) > 0:
        print(response["outputs"][0]["text"])
        print("Full response:", response)
    else:
        print("No output received from the guardrail.")
        print("Full response:", response)

首先对输入内容进行安全评估，输入问题并调用函数Apply

python 复制代码

#指定围栏ID、Version以及Source，测试文本审核

guardrailId = "vewdslvdea87"
guardrailVersion = "1"

source = "INPUT"
text = "我想购买bitboin."
apply(guardrailId, guardrailVersion, source, text)

得到回复表示该输入不符合我们配置的安全规则，拒绝回复。

XML 复制代码

GUARDRAIL_INTERVENED
Sorry, the model cannot answer this question.
Full response: {'ResponseMetadata': {'RequestId': 'ca03bf06-1e72-416f-990b-773349104ef1', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 22 Jul 2024 07:42:10 GMT', 'content-type': 'application/json', 'content-length': '514', 'connection': 'keep-alive', 'x-amzn-requestid': 'ca03bf06-1e72-416f-990b-773349104ef1'}, 'RetryAttempts': 0}, 'usage': {'topicPolicyUnits': 1, 'contentPolicyUnits': 1, 'wordPolicyUnits': 1, 'sensitiveInformationPolicyUnits': 1, 'sensitiveInformationPolicyFreeUnits': 0, 'contextualGroundingPolicyUnits': 0}, 'action': 'GUARDRAIL_INTERVENED', 'outputs': [{'text': 'Sorry, the model cannot answer this question.'}], 'assessments': [{'topicPolicy': {'topics': [{'name': 'FSI', 'type': 'DENY', 'action': 'BLOCKED'}]}}]}

9.接下来对输出内容进行安全评估，将输出内容作为参数传递给函数Apply

python 复制代码

#指定围栏ID、Version以及Source，测试文本审核

guardrailId = "vewdslvdea87"
guardrailVersion = "1"

source = "OUTPUT"
text = """Elon Musk is a visionary entrepreneur and business magnate. 
    He is the CEO of Tesla, an electric vehicle company, and SpaceX, 
    an aerospace manufacturer focused on space exploration. 
    With his innovative ideas and ambitious goals, 
    Musk has significantly impacted the automotive and space industries."""

apply(guardrailId, guardrailVersion, source, text)

得到回复表示该输出内容不符合我们配置的安全规则，拒绝回复。

XML 复制代码

GUARDRAIL_INTERVENED
Sorry, the model cannot answer this question.
Full response: {'ResponseMetadata': {'RequestId': '33dc20d0-e34d-4cb3-a5d2-9b723f619b43', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 22 Jul 2024 07:42:12 GMT', 'content-type': 'application/json', 'content-length': '591', 'connection': 'keep-alive', 'x-amzn-requestid': '33dc20d0-e34d-4cb3-a5d2-9b723f619b43'}, 'RetryAttempts': 0}, 'usage': {'topicPolicyUnits': 1, 'contentPolicyUnits': 1, 'wordPolicyUnits': 1, 'sensitiveInformationPolicyUnits': 1, 'sensitiveInformationPolicyFreeUnits': 0, 'contextualGroundingPolicyUnits': 0}, 'action': 'GUARDRAIL_INTERVENED', 'outputs': [{'text': 'Sorry, the model cannot answer this question.'}], 'assessments': [{'sensitiveInformationPolicy': {'piiEntities': [{'match': 'Elon Musk', 'type': 'NAME', 'action': 'BLOCKED'}, {'match': 'Musk', 'type': 'NAME', 'action': 'BLOCKED'}]}}]}

以上就是在亚马逊云科技上沉浸式体验大模型平台Amazon Bedrock的有害内容筛选功能的全部步骤。欢迎大家关注小李哥的亚马逊云科技AI服务深入调研系列，未来获取更多国际前沿的AWS云开发/云架构方案。