【AI落地应用实战】Amazon Bedrock Converse API + Amazon Titan构建一个RAG应用（Retrieval-Augmente

RAG,全称为"Retrieval-Augmented Generation"，即"检索增强的生成"，是一种结合了检索（Retrieval）和生成（Generation）的深度学习模型，LLM在回答问题或生成文本时，RAG会先从大量文档中检索出相关的信息，然后基于这些信息生成回答或文本，从而提高预测质量。这种模型主要用于自然语言处理（NLP）任务，尤其是在需要理解和生成文本的场景中。

在这篇文章中，我们将探讨如何利用Amazon Bedrock + Amazon Titan 等技术构建一个GPT应用程序。这个应用程序不仅能够理解用户的问题，还能够从海量的数据中检索出相关信息，生成准确而有用的回答。

一、快速体验

1.1、注册账号

进入亚马逊云科技海外区域，立即注册：

输入邮箱地址和账号名称并进行注册

1.2、Amazon Bedrock简介

首先进入Amazon海外站官网，输入Bedrock，点击Amazon Bedrock进入详情页

然后点击入门，进入Amazon Bedrock控制台：

进入后点击概览（Overview），我们可以看到 Amazon Bedrock 支持的多种基础模型（Foundation Model），其中包括 Amazon Titan，Anthropic Claude，Jurassic，Command，Mistral，Stable Diﬀusion 以及 Llama3。这些基础模型涵盖了文本生成、图像生成等多种应用场景，无论是个人开发者还是大型企业，都可以通过Amazon Bedrock来构建和部署高质量的生成式AI应用程序。

1.2、Amazon Titan简介

其中，Amazon Bedrock 独有的 Amazon Titan 系列模型融合了 Amazon 25 年来在其业务范围内积累的人工智能和机器学习创新的经验。作为亚马逊推出的大型语言模型，Amazon Titan具备强大的自然语言处理能力和广泛的应用场景，能够更灵活地适应不同用户的需求和场景，提供了更广泛的应用可能性。

广泛的应用范围：高性能图像、多模态和文本模型为广泛的生成式人工智能应用提供支持，例如内容创建、图像生成以及搜索和推荐体验。
负责任和道德性：所有 Amazon Titan FM 都为负责任地使用 AI 提供内置支持，具体方法是检测并移除数据中的有害内容、拒绝不当的用户输入以及筛选模型输出。

Titan 模型分为三种类型：嵌入、文本生成和图像生成。

Titan Embeddings G1 -- 文本模型将文本输入（单词、短语或可能的大型文本单元）转换为包含文本语义的数字表示（称为嵌入）。虽然该法学硕士不会生成文本，但它对于个性化和搜索等应用程序很有用。通过比较嵌入，该模型将产生比单词匹配更相关和上下文的响应。新的 Titan Multimodal Embeddings G1 模型适用于通过文本搜索图像、通过图像相似性或通过文本和图像的组合搜索图像等用例。它将输入图像或文本转换为嵌入，该嵌入在同一语义空间中包含图像和文本的语义。
Titan Text 模型是生成式 LLM，适用于摘要、文本生成（例如，创建博客文章）、分类、开放式问答和信息提取等任务。他们还接受过许多不同编程语言以及表格、JSON 和 csv 等富文本格式的培训。
Titan Image Generator G1 是一种生成基础模型，可从自然语言文本生成图像。该模型还可用于编辑或生成现有或生成的图像的变体。

1.3、模型体验

首先点击 Chat 按钮，然后在 Chat Playground 页面点击选择模型（Select model）以选择Amazon Titan：

进入后，可以看到成功打开Amazon Titan Text G1-Express，其适用于各种高级通用语言任务，如开放式文本生成、对话聊天以及检索增强生成（RAG）内的支持。它针对英语进行了优化，并提供了30多种其他语言的多语言支持（预览版）。

左侧则为模型的推理参数，包括温度和顶部P值，控制模型输出的随机性和多样性。温度参数影响模型生成文本的随机性，而顶部P值则用于限制模型生成文本时的多样性。通过调整这些参数，用户可以获得更符合其需求的文本输出。

另外，通过选择查看 API 请求，还可以使用亚马逊云科技命令行界面 (Amazon CLI) 和 Amazon SDK 中的代码示例来访问该模型。您可以使用诸如 meta.llama3-8b-instruct-v1 或 meta.llama3-70b-instruct-v1 这样的模型 ID。这是一个 Amazon CLI 命令样本。

bash 复制代码

$ aws bedrock - runtime invoke - model--model - id meta.llama3 - 8 b - instruct - v1: 0--body "{"prompt":"Simply put, the theory of relativity states that\n the laws of physics are the same everywhere in the universe, and that the passage of time and the length of objects can vary depending on their speed and position in a gravitational field ","max_gen_len":512,"temperature":0.5,"top_p":0.9}"--cli - binary - format raw - in -base64 - out--region us - east - 1\ invoke - model - output.txt

并且，可以使用 Amazon Bedrock + Amazon SDK 用各种编程语言构建您的应用程序。

python 复制代码

# Use the Converse API to send a text message to Titan Text G1 - Express.

import boto3
from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS Region you want to use.
client = boto3.client("bedrock-runtime", region_name="us-west-2")

# Set the model ID, e.g., Titan Text Premier.
model_id = "amazon.titan-text-express-v1"

# Start a conversation with the user message.
user_message = """Generate synthetic data for daily product sales in various categories - include row number, product name, category, date of sale and price. Produce output in JSON format. Count records and ensure there are no more than 5."""
conversation = [
    {
        "role": "user",
        "content": [{"text": user_message}],
    }
]

try:
    # Send the message to the model, using a basic inference configuration.
    response = client.converse(
        modelId="amazon.titan-text-express-v1",
        messages=conversation,
        inferenceConfig={"maxTokens":1024,"stopSequences":["User:"],"temperature":0,"topP":1},
        additionalModelRequestFields={}
    )

    # Extract and print the response text.
    response_text = response["output"]["message"]["content"][0]["text"]
    print(response_text)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

二、通过 SDK 构建RAG应用

2.1、RAG简介

RAG 技术是一种结合检索和生成的技术方法，旨在通过外部知识源来增强模型的生成能力。在生成文本时，模型首先从一个大规模的知识库或文档集合中进行检索，获取与当前生成任务相关的信息，然后利用这些检索到的信息来辅助生成更加准确、全面和有依据的文本。这种方法能够有效减少模型生成无根据或错误内容的风险，提高生成内容的质量和可靠性。同时，RAG技术还具有可解释性强、易于定制等优点，能够根据不同领域和任务的需求进行灵活调整。

LLM在回答问题或生成文本时，RAG会先从大量文档中检索出相关的信息，然后基于这些信息生成回答或文本，从而提高预测质量。这种模型主要用于自然语言处理（NLP）任务，尤其是在需要理解和生成文本的场景中。

2.2、通过SDK访问Amazon Titan构建RAG应用

这里我们通过 SDK 来访问 Bedrock 服务，并借助 Amazon Bedrock Converse API 来创建应用程序，并与 Amazon Bedrock 中的模型进行消息的发送和接收。

其中Converse API 提供了对多种模型的统一 API 接入方式，并支持流式、非流式调用。在功能上，支持包括工具调用（Tool use）、上传文档聊天（Document chat）等。下面是基于 Amazon Titan模型做 RAG 场景的示例（Python 为例）。

python 复制代码

import logging
import boto3

from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

def stream_conversation(bedrock_client,
                    model_id,
                    messages,
                    system_prompts,
                    inference_config,
                    additional_model_fields):
    """
    Sends messages to a model and streams the response.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        messages (JSON) : The messages to send.
        system_prompts (JSON) : The system prompts to send.
        inference_config (JSON) : The inference configuration to use.
        additional_model_fields (JSON) : Additional model fields to use.

    Returns:
        Nothing.

    """

    print("Streaming messages with model %s" % model_id)

    bedrock_params = {
        "modelId": model_id,
        "messages": messages,
        "inferenceConfig": inference_config,
        "additionalModelRequestFields": additional_model_fields,
    }

    system = [item for item in system_prompts if item.get('text')]
    if system:
        bedrock_params['system'] = system

    response = bedrock_client.converse_stream( **bedrock_params )
    stream = response.get('stream')
    resp_text_buf = ''
    if stream:
        for event in stream:
            # print(colored(event, 'red'))
            if 'messageStart' in event:
                print(f"\nRole: {event['messageStart']['role']}")

            if 'contentBlockDelta' in event:
                delta_types = event['contentBlockDelta']['delta'].keys()
                if 'text' in delta_types:
                    text_delta = event['contentBlockDelta']['delta']['text']
                    print(colored(text_delta, 'green'), end="")
                    resp_text_buf += text_delta


def main():
    model_id = "amazon.titan-text-express-v1"

    # Message to send to the model.
    input_text = "中国的首都是哪里，今天天气如何?"
    print(colored(f"Question: {input_text}", 'red'))

    message = {
        "role": "user",
        "content": [{"text": input_text}]
    }
    messages = [message]
    
    # System prompts.
    system_prompts = []

    # inference parameters to use.
    temperature = 0.9
    top_k = 200
    max_tokens = 2000

    # Base inference parameters.
    inference_config = {
        "temperature": temperature,
        "maxTokens": max_tokens,
    }
    
    # Additional model inference parameters.
    additional_model_fields = {
        "k": top_k,
        "documents": [
            {"title": "首都信息", "snippet": "北京是中国的首都，政治文化中心，历史悠久。长城故宫，美食荟萃。"}, 
            {"title": "北京天气", "snippet": "北京今天的天气晴朗，温度26度，微风。"},
            {"title": "上海天气", "snippet": "上海今天天气多云，28度。"}, 
        ]
    }

    try:
        bedrock_client = boto3.client(service_name='bedrock-runtime')

        stream_conversation(bedrock_client, model_id, messages,
                        system_prompts, inference_config, additional_model_fields)

    except ClientError as err:
        message = err.response['Error']['Message']
        logger.error("A client error occurred: %s", message)

    else:
        print(f"\nFinished streaming messages with model {model_id}.")

main()

其中模型提供如下知识：

标题	内容
首都信息	北京是中国的首都，政治文化中心，历史悠久。长城故宫，美食荟萃。
北京天气	北京今天的天气晴朗，温度26度，微风。
上海天气	上海今天天气多云，28度。

本地运行后，模型推理结果为：

问题	模型响应
中国的首都是哪里，今天天气如何?	北京是中国的首都，今天的天气晴朗，温度是26度。
上海和南京的天气如何?	上海今天天气多云，温度为28度。南京的话，我没有找到相关天气信息。希望您能谅解。