使用亚马逊 Bedrock：Serverless LLM apps with Amazon Bedrock

Serverless LLM apps with Amazon Bedrock

本文是学习 https://www.deeplearning.ai/short-courses/serverless-llm-apps-amazon-bedrock/ 这门课的学习笔记。

What you'll learn in this course

In this course, you'll learn how to deploy a large language model-based application into production using serverless technology.

A serverless architecture enables you to quickly deploy your applications without the need to manage and scale the infrastructure that it runs on.

You'll learn to summarize audio files by pairing an LLM with an automatic speech recognition (ASR) model. Through hands-on exercises, you'll build an event-driven system that automatically detects incoming customer inquiries, transcribes them with ASR and summarizes them with an LLM, using Amazon Bedrock.

After course completion, you'll know how to:

Prompt an LLM and customize its responses using Amazon Bedrock.
Convert audio recordings into written transcripts with Amazon Transcribe, and then summarize these transcripts using an LLM, Amazon Titan.
Enable logging for all the calls you make to LLMs to help you maintain security, audit, and compliance standards.
Deploy this audio summarizer as an event-driven serverless workflow using AWS Lambda.

You'll work with the Amazon Titan model, but in practice Amazon Bedrock allows you to use any model you prefer.

Start building serverless LLM applications with Amazon Bedrock and deploy your apps in just days.

文章目录

[Serverless LLM apps with Amazon Bedrock](#Serverless LLM apps with Amazon Bedrock)
- [What you'll learn in this course](#What you’ll learn in this course)
[Lesson 1 - Your first generations with Amazon Bedrock](#Lesson 1 - Your first generations with Amazon Bedrock)
- - [Import all needed packages](#Import all needed packages)
  - [Setup the Bedrock runtime](#Setup the Bedrock runtime)
  - [Generation Configuration](#Generation Configuration)
  - [Working with other type of data](#Working with other type of data)
[Lesson 2: Summarize an audio file](#Lesson 2: Summarize an audio file)
- - [Import all needed packages](#Import all needed packages)
  - [Let's start with transcribing an audio file](#Let's start with transcribing an audio file)
  - [Now, let's use an LLM](#Now, let's use an LLM)
[Lesson 3: Enable Logging](#Lesson 3: Enable Logging)
[Lesson 4: Deploy an AWS Lambda function](#Lesson 4: Deploy an AWS Lambda function)
- - [Create the Lambda function](#Create the Lambda function)
[Lesson 5: Event-driven generation](#Lesson 5: Event-driven generation)
- - [Deploy your lambda function](#Deploy your lambda function)
后记

Lesson 1 - Your first generations with Amazon Bedrock

Welcome to Lesson 1.

You'll start with using Amazon Bedrock to prompt a model and customize how it generates its response.

Note: To access the requirements.txt file, go to File and click on Open. Here, you will also find all helpers functions and datasets used in each lesson.

I hope you enjoy this course!

requirements.txt

py 复制代码

# Note: To run the code in these files in your own environment you will need to install 
# some Python dependencies, and also have access to an AWS environment.  

# AWS Development Environment Setup:
# ----------------------------------
# 1. Create AWS Account and an IAM user.  Follow this guide for more details: https://docs.aws.amazon.com/SetUp/latest/UserGuide/setup-overview.html
# 2. Generate some access keys for your IAM user. For more information see here: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey
# 3. Install the AWS CLI. Follow this guide for more details: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
# 4. Configure AWS CLI: Run 'aws configure' with your access keys, set default region to 'us-west-2' for this code. For more information see here: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html#cli-configure-files-methods 
# 5. Use Boto3 for AWS SDK in Python. Your locally run code should interact with your AWS account. https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

# Note: If using a company AWS account, consult your AWS administrator before proceeding.


# requirements file
# This code was developed and tested on Python 3.10


boto3==1.28.68

Display_Helper.py

py 复制代码

from IPython.display import HTML, display
import html

class Display_Helper: 

    def wrap(self, long_string):
        escaped_string = html.escape(long_string)
        html_string = escaped_string.replace("\n", "<br>")
        display(HTML(f"<div style='width: 600px; word-wrap: break-word;'>{html_string}</div>"))

    def text_file(self, file_path):
        try:
            # Open the file and read its contents into a string
            with open(file_path, 'r', encoding='utf-8') as file:
                text_string = file.read()

            # Print the contents (optional)
            print(f"{file_path}:")
            self.wrap(text_string)

        except FileNotFoundError:
            print(f"The file {file_path} was not found.")
        except Exception as e:
            print(f"An error occurred: {e}")

Import all needed packages

py 复制代码

import boto3
import json

Setup the Bedrock runtime

py 复制代码

bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-west-2')

prompt = "Write a one sentence summary of Las Vegas."

kwargs = {
    "modelId": "amazon.titan-text-lite-v1",
    "contentType": "application/json",
    "accept": "*/*",
    "body": json.dumps(
        {
            "inputText": prompt
        }
    )
}

py 复制代码

response = bedrock_runtime.invoke_model(**kwargs)
response

Output

json 复制代码

{'ResponseMetadata': {'RequestId': 'a03a2d5a-6610-42af-bef7-b08d48bd9ad4',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Thu, 06 Jun 2024 05:10:52 GMT',
   'content-type': 'application/json',
   'content-length': '166',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'a03a2d5a-6610-42af-bef7-b08d48bd9ad4',
   'x-amzn-bedrock-invocation-latency': '1003',
   'x-amzn-bedrock-output-token-count': '15',
   'x-amzn-bedrock-input-token-count': '9'},
  'RetryAttempts': 0},
 'contentType': 'application/json',
 'body': <botocore.response.StreamingBody at 0x7fa06041d270>}

py 复制代码

response_body = json.loads(response.get('body').read())
print(json.dumps(response_body, indent=4))

Output

json 复制代码

{
    "inputTextTokenCount": 9,
    "results": [
        {
            "tokenCount": 15,
            "outputText": "\nLas Vegas is known for its casinos, entertainment, and nightlife.",
            "completionReason": "FINISH"
        }
    ]
}

py 复制代码

print(response_body['results'][0]['outputText'])

Output

py 复制代码

print(response_body['results'][0]['outputText'])

Generation Configuration

py 复制代码

prompt = "Write a summary of Las Vegas."
kwargs = {
    "modelId": "amazon.titan-text-express-v1",
    "contentType": "application/json",
    "accept": "*/*",
    "body" : json.dumps(
        {
            "inputText": prompt,
            "textGenerationConfig": {
                "maxTokenCount": 100,
                "temperature": 0.7,
                "topP": 0.9
            }
        }
    )
}

py 复制代码

response = bedrock_runtime.invoke_model(**kwargs)
response_body = json.loads(response.get('body').read())

generation = response_body['results'][0]['outputText']
print(generation)

Output

json 复制代码

Las Vegas is the 28th-most populated city in the United States, the most populated city in the state of Nevada, and the county seat of Clark County. It is an internationally renowned major resort city, known primarily for its gambling, shopping, fine dining, entertainment, and nightlife. Las Vegas is often referred to as the Entertainment Capital of the World and is famous for its luxurious casino-hotels, such as the Bellagio, Caesars Palace, and the Venetian.

py 复制代码

print(json.dumps(response_body, indent=4))

Output

json 复制代码

{
    "inputTextTokenCount": 7,
    "results": [
        {
            "tokenCount": 100,
            "outputText": "\nLas Vegas is the 28th-most populated city in the United States, the most populated city in the state of Nevada, and the county seat of Clark County. It is an internationally renowned major resort city, known primarily for its gambling, shopping, fine dining, entertainment, and nightlife. Las Vegas is often referred to as the Entertainment Capital of the World and is famous for its luxurious casino-hotels, such as the Bellagio, Caesars Palace, and the Venetian.",
            "completionReason": "LENGTH"
        }
    ]
}

py 复制代码

kwargs = {
    "modelId": "amazon.titan-text-express-v1",
    "contentType": "application/json",
    "accept": "*/*",
    "body" : json.dumps(
        {
            "inputText": prompt,
            "textGenerationConfig": {
                "maxTokenCount": 500,
                "temperature": 0.7,
                "topP": 0.9
            }
        }
    )
}

py 复制代码

response = bedrock_runtime.invoke_model(**kwargs)
response_body = json.loads(response.get('body').read())

generation = response_body['results'][0]['outputText']
print(generation)

Output

py 复制代码

Las Vegas is a city in Nevada known for its luxurious casinos, hotels, and nightlife. It is a popular tourist destination and a hub for business and entertainment. Las Vegas is home to many famous landmarks, including the Las Vegas Strip, the Bellagio, and the Venetian. The city is known for its high-energy atmosphere, world-class entertainment, and luxurious accommodations. Las Vegas is also a popular destination for weddings and conventions, with many hotels and resorts offering specialized services and amenities for these events.

py 复制代码

print(json.dumps(response_body, indent=4))

Output

py 复制代码

{
    "inputTextTokenCount": 7,
    "results": [
        {
            "tokenCount": 104,
            "outputText": "\nLas Vegas is a city in Nevada known for its luxurious casinos, hotels, and nightlife. It is a popular tourist destination and a hub for business and entertainment. Las Vegas is home to many famous landmarks, including the Las Vegas Strip, the Bellagio, and the Venetian. The city is known for its high-energy atmosphere, world-class entertainment, and luxurious accommodations. Las Vegas is also a popular destination for weddings and conventions, with many hotels and resorts offering specialized services and amenities for these events.",
            "completionReason": "FINISH"
        }
    ]
}

Working with other type of data

py 复制代码

from IPython.display import Audio
audio = Audio(filename="dialog.mp3")
display(audio)

with open('transcript.txt', "r") as file:
    dialogue_text = file.read()
    
print(dialogue_text)

Output

py 复制代码

spk_0: Hi, is this the Crystal Heights Hotel in Singapore? 
spk_1: Yes, it is. Good afternoon. How may I assist you today? 
spk_0: Fantastic, good afternoon. I was looking to book a room for my 10th wedding anniversary. Ive heard your hotel offers exceptional views and services. Could you tell me more? 
spk_1: Absolutely, Alex and congratulations on your upcoming anniversary. Thats a significant milestone and wed be honored to make it a special occasion for you. We have several room types that offer stunning views of the city skyline and the fictional Sapphire Bay. Our special diamond suite even comes with exclusive access to the moonlit pool and star deck. We also have in house spa services, world class dining options and a shopping arcade. 
spk_0: That sounds heavenly. I think my spouse would love the moonlit pool. Can you help me make a reservation for one of your diamond suites with a sapphire bay view? 
spk_1: Of course. May I know the dates you planning to visit? 
spk_0: Sure. It would be from October 10th to 17th. 
spk_1: Excellent. Let me check the availability. Ah It looks like we have a diamond suite available for those dates. Would you like to proceed with the reservation? 
spk_0: Definitely. Whats included in the package? 
spk_1: Wonderful. The package includes breakfast, complimentary access to the moonlit pool and star deck. A one time spa treatment for two and a special romantic dinner at our cloud nine restaurant. 
spk_0: You making it impossible to resist. Lets go ahead with the booking. 
spk_1: Great. I'll need some personal information for the reservation. Can I get your full name, contact details and a credit card for the preauthorizations? 
spk_0: Certainly. My full name is Alexander Thompson. My contact number is 12345678910. And the credit card is, wait, did you say pre authorization? How much would that be? 
spk_1: Ah, I should have mentioned that earlier. My apologies. A pre authorization amount of '$1000' will be held on your card which would be released upon checkout. 
spk_0: '$1000'. That seems a bit excessive. Don't you think 
spk_1: I understand your concern, Alex. The pre authorization is a standard procedure to cover any incidental expenses you may incur during your stay. However, I assure you its only a hold and not an actual charge. 
spk_0: Thats still a lot. Are there any additional charges that I should know about? 
spk_1: Well, there is a 10% service charge and a 7% fantasy tax applied to the room rate. 
spk_0: Mm. You know what its a special occasion. So lets go ahead. 
spk_1: Thank you, Alex for understanding. Will ensure that your experience at Crystal Heights is well worth it.

py 复制代码

prompt = f"""The text between the <transcript> XML tags is a transcript of a conversation. 
Write a short summary of the conversation.

<transcript>
{dialogue_text}
</transcript>

Here is a summary of the conversation in the transcript:"""

py 复制代码

kwargs = {
    "modelId": "amazon.titan-text-express-v1",
    "contentType": "application/json",
    "accept": "*/*",
    "body": json.dumps(
        {
            "inputText": prompt,
            "textGenerationConfig": {
                "maxTokenCount": 512,
                "temperature": 0,
                "topP": 0.9
            }
        }
    )
}

response = bedrock_runtime.invoke_model(**kwargs)

response_body = json.loads(response.get('body').read())
generation = response_body['results'][0]['outputText']

print(generation)

Output

py 复制代码

Alex is looking to book a room for his 10th wedding anniversary at the Crystal Heights Hotel in Singapore. The hotel offers several room types that offer stunning views of the city skyline and the fictional Sapphire Bay. The special diamond suite even comes with exclusive access to the moonlit pool and star deck. The package includes breakfast, complimentary access to the moonlit pool and star deck, a one-time spa treatment for two, and a special romantic dinner at the cloud nine restaurant. A pre-authorization amount of $1000 will be held on the card, which will be released upon checkout. There is a 10% service charge and a 7% fantasy tax applied to the room rate.

Lesson 2: Summarize an audio file

Import all needed packages

py 复制代码

import os
from IPython.display import Audio
import boto3
import uuid
import time
import json
from jinja2 import Template

Let's start with transcribing an audio file

py 复制代码

audio = Audio(filename="dialog.mp3")
display(audio)

2 client

py 复制代码

s3_client = boto3.client('s3', region_name='us-west-2')
bucket_name = os.environ['BucketName']
file_name = 'dialog.mp3'
s3_client.upload_file(file_name, bucket_name, file_name)


transcribe_client = boto3.client('transcribe', region_name='us-west-2')
job_name = 'transcription-job-' + str(uuid.uuid4())

py 复制代码

response = transcribe_client.start_transcription_job(
    TranscriptionJobName=job_name,
    Media={'MediaFileUri': f's3://{bucket_name}/{file_name}'},
    MediaFormat='mp3',
    LanguageCode='en-US',
    OutputBucketName=bucket_name,
    Settings={
        'ShowSpeakerLabels': True,
        'MaxSpeakerLabels': 2
    }
)

py 复制代码

while True:
    status = transcribe_client.get_transcription_job(TranscriptionJobName=job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    time.sleep(2)
print(status['TranscriptionJob']['TranscriptionJobStatus'])

Output

py 复制代码

COMPLETED

py 复制代码

if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':
    
    # Load the transcript from S3.
    transcript_key = f"{job_name}.json"
    transcript_obj = s3_client.get_object(Bucket=bucket_name, Key=transcript_key)
    transcript_text = transcript_obj['Body'].read().decode('utf-8')
    transcript_json = json.loads(transcript_text)
    
    output_text = ""
    current_speaker = None
    
    items = transcript_json['results']['items']
    
    for item in items:
        
        speaker_label = item.get('speaker_label', None)
        content = item['alternatives'][0]['content']
        
        # Start the line with the speaker label:
        if speaker_label is not None and speaker_label != current_speaker:
            current_speaker = speaker_label
            output_text += f"\n{current_speaker}: "
            
        # Add the speech content:
        if item['type'] == 'punctuation':
            output_text = output_text.rstrip()
            
        output_text += f"{content} "
        
    # Save the transcript to a text file
    with open(f'{job_name}.txt', 'w') as f:
        f.write(output_text)

Now, let's use an LLM

py 复制代码

bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-west-2')
with open(f'{job_name}.txt', "r") as file:
    transcript = file.read()

py 复制代码

%%writefile prompt_template.txt
I need to summarize a conversation. The transcript of the 
conversation is between the <data> XML like tags.

<data>
{{transcript}}
</data>

The summary must contain a one word sentiment analysis, and 
a list of issues, problems or causes of friction
during the conversation. The output must be provided in 
JSON format shown in the following example. 

Example output:
{
    "sentiment": <sentiment>,
    "issues": [
        {
            "topic": <topic>,
            "summary": <issue_summary>,
        }
    ]
}

Write the JSON output and nothing more.

Here is the JSON output:

py 复制代码

with open('prompt_template.txt', "r") as file:
    template_string = file.read()
    
data = {
    'transcript' : transcript
}
template = Template(template_string)
prompt = template.render(data)

json 复制代码

kwargs = {
    "modelId": "amazon.titan-text-express-v1",
    "contentType": "application/json",
    "accept": "*/*",
    "body": json.dumps(
        {
            "inputText": prompt,
            "textGenerationConfig": {
                "maxTokenCount": 512,
                "temperature": 0,
                "topP": 0.9
            }
        }
    )
}

py 复制代码

response = bedrock_runtime.invoke_model(**kwargs)
response_body = json.loads(response.get('body').read())
generation = response_body['results'][0]['outputText']
print(generation)

Output

json 复制代码

{
    "sentiment": "Positive",
    "issues": []
}

Lesson 3: Enable Logging

py 复制代码

import boto3
import json
import os

bedrock = boto3.client('bedrock', region_name="us-west-2")

from helpers.CloudWatchHelper import CloudWatch_Helper
cloudwatch = CloudWatch_Helper()

py 复制代码

log_group_name = '/my/amazon/bedrock/logs'
cloudwatch.create_log_group(log_group_name)

Output

py 复制代码

Log group '/my/amazon/bedrock/logs' created successfully

Permissions

py 复制代码

loggingConfig = {
    'cloudWatchConfig': {
        'logGroupName': log_group_name,
        'roleArn': os.environ['LOGGINGROLEARN'],
        'largeDataDeliveryS3Config': {
            'bucketName': os.environ['LOGGINGBUCKETNAME'],
            'keyPrefix': 'amazon_bedrock_large_data_delivery',
        }
    },
    's3Config': {
        'bucketName': os.environ['LOGGINGBUCKETNAME'],
        'keyPrefix': 'amazon_bedrock_logs',
    },
    'textDataDeliveryEnabled': True,
}

py 复制代码

bedrock.put_model_invocation_logging_configuration(loggingConfig=loggingConfig)

Output

json 复制代码

{'ResponseMetadata': {'RequestId': 'ff946ac9-3b70-46b3-813a-3df74e5a3249',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Thu, 06 Jun 2024 05:54:13 GMT',
   'content-type': 'application/json',
   'content-length': '2',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'ff946ac9-3b70-46b3-813a-3df74e5a3249'},
  'RetryAttempts': 0}}

py 复制代码

bedrock.get_model_invocation_logging_configuration()

Output

json 复制代码

{'ResponseMetadata': {'RequestId': '9d7ed35e-e290-42ad-bd7b-af4e4cf0adc7',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Thu, 06 Jun 2024 05:54:13 GMT',
   'content-type': 'application/json',
   'content-length': '572',
   'connection': 'keep-alive',
   'x-amzn-requestid': '9d7ed35e-e290-42ad-bd7b-af4e4cf0adc7'},
  'RetryAttempts': 0},
 'loggingConfig': {'cloudWatchConfig': {'logGroupName': '/my/amazon/bedrock/logs',
   'roleArn': 'arn:aws:iam::359221361672:role/c99355a2566044l6746327t1w43396592481-LoggingIAMRole-02xH6ovfIJxX',
   'largeDataDeliveryS3Config': {'bucketName': 'c99355a2566044l6746327t1w433965924-loggings3bucket-ujar4vdntz1x',
    'keyPrefix': 'amazon_bedrock_large_data_delivery'}},
  's3Config': {'bucketName': 'c99355a2566044l6746327t1w433965924-loggings3bucket-ujar4vdntz1x',
   'keyPrefix': 'amazon_bedrock_logs'},
  'textDataDeliveryEnabled': True,
  'imageDataDeliveryEnabled': True,
  'embeddingDataDeliveryEnabled': True}}

py 复制代码

bedrock_runtime = boto3.client('bedrock-runtime', region_name="us-west-2")

prompt = "Write an article about the fictional planet Foobar."

kwargs = {
    "modelId": "amazon.titan-text-express-v1",
    "contentType": "application/json",
    "accept": "*/*",
    "body": json.dumps(
        {
            "inputText": prompt,
            "textGenerationConfig": {
                "maxTokenCount": 512,
                "temperature": 0.7,
                "topP": 0.9
            }
        }
    )
}

response = bedrock_runtime.invoke_model(**kwargs)
response_body = json.loads(response.get('body').read())

generation = response_body['results'][0]['outputText']

print(generation)

Output

json 复制代码

Foobar is a fictional planet from the television series "Futurama." It is a planet that is home to a race of creatures known as the Foobars. The Foobars are a peaceful, intelligent species that are known for their advanced technology and culture.

One of the most notable features of Foobar is its technology. The planet is home to a number of advanced technologies, including flying cars, teleporters, and robots. The Foobars have also developed a form of energy that is renewable and sustainable, which they use to power their cities and devices.

Another important aspect of Foobar culture is its art and music. The Foobars are known for their vibrant and colorful art, which often depicts their advanced technology and natural landscapes. Music is also an important part of Foobar culture, with a variety of genres and styles being played throughout the planet.

Despite its advanced technology and peaceful culture, Foobar is not without its challenges. The planet is located in a remote part of the galaxy, making it difficult for other species to reach and interact with. Additionally, Foobar is threatened by a number of natural disasters, including hurricanes and earthquakes.

Despite these challenges, the Foobars continue to thrive and grow. They are a proud and resilient species, and their technology and culture continue to inspire and amaze other species in the galaxy. Foobar is a fictional planet that has captured the imagination of millions of people around the world. Its advanced technology, peaceful culture, and unique art and music make it a fascinating and exciting place to explore. Whether you are a fan of "Futurama" or just looking for a new and exciting world to explore, Foobar is a great choice.

py 复制代码

cloudwatch.print_recent_logs(log_group_name)

Output

py 复制代码

Permissions are correctly set for Amazon Bedrock logs.

Lesson 4: Deploy an AWS Lambda function

Lambda_Helper.py

py 复制代码

import boto3
import zipfile
import json
import os




class Lambda_Helper: 

    def __init__(self):
        
        # Get the account ID being used
        sts_client = boto3.client('sts')
        response = sts_client.get_caller_identity()
        account_id = response['Account']
        
        # Create a Boto3 client for the Lambda service
        self.lambda_client = boto3.client('lambda', region_name='us-west-2')  
        self.function_name = 'LambdaFunctionDLAICourse'
        self.role_arn = f"arn:aws:iam::{account_id}:role/LambdaRoleDLAICourse"
        self.function_description = "Lambda function uploaded by a notebook in a DLAI course."
        self.lambda_arn = ""
        self.lambda_environ_variables = {}
        self.filter_rules_suffix = ""
        
        self.s3_client = boto3.client('s3', region_name='us-west-2')
            
    def deploy_function(self, code_file_names, function_name=""):

        if function_name:
            self.function_name = function_name
        else:
            print(f"Using function name: {self.function_name}")
        
        print('Zipping function...')
        zip_file_path = 'lambda_function.zip'

        with zipfile.ZipFile(zip_file_path, 'w') as zipf:
            for code_file_name in code_file_names:
                zipf.write(code_file_name, arcname=code_file_name)

        try:
            print('Looking for existing function...')
            # Try to get the details of the Lambda function
            self.lambda_client.get_function(FunctionName=self.function_name)

            # If the function exists, update its code
            print(f"Function {self.function_name} exists. Updating code...")
            response = self.lambda_client.update_function_code(
                FunctionName=self.function_name,
                ZipFile=open(zip_file_path, 'rb').read()  # Read the ZIP file and provide its content
            )
            print(f"Function {self.function_name} code updated: {response['LastModified']}")
            self.lambda_arn = response['FunctionArn']
            print("Done.")

        except self.lambda_client.exceptions.ResourceNotFoundException:
            # If the function does not exist, create a new one
            print(f"Function {self.function_name} does not exist. Creating...")
            response = self.lambda_client.create_function(
                FunctionName=self.function_name,
                Runtime='python3.11',
                Role=self.role_arn,
                Handler='lambda_function.lambda_handler',
                Description=self.function_description,
                Layers=[os.environ['LAMBDALAYERVERSIONARN']],
                Timeout=120,
                Code={
                    'ZipFile': open(zip_file_path, 'rb').read()
                },
                Environment= {'Variables': self.lambda_environ_variables }
            )
            print(f"Function {self.function_name} created: {response['FunctionArn']}")
            self.lambda_arn = response['FunctionArn']
            print("Done.")

        except Exception as e:
            # Handle other potential exceptions
            print(f"An error occurred: {e}")
            self.lambda_arn = ""
            print("Done, with error.")
            

    def add_lambda_trigger(self, bucket_name, function_name=""):

        if function_name:
            self.function_name = function_name
        else:
            print(f"Using function name of deployed function: {self.function_name}")
        
        # Check and remove existing permissions for the specific source (S3 bucket)
        try:
            policy = self.lambda_client.get_policy(FunctionName=self.function_name)['Policy']
            policy_dict = json.loads(policy)
        
            for statement in policy_dict['Statement']:
                if statement['Action'] == 'lambda:InvokeFunction' and self.lambda_arn in statement['Resource']:
                    self.lambda_client.remove_permission(
                        FunctionName=self.function_name,
                        StatementId=statement['Sid']
                    )
                    print(f"Removed existing permission: {statement['Sid']}")

        except self.lambda_client.exceptions.ResourceNotFoundException:
            # Handle if the policy is not found - might mean function has no permissions set
            pass

        except Exception as e:
            # Handle other exceptions
            print(f"Error checking for existing permissions: {e}")
            return

        # Grant permission to S3 to invoke the Lambda function
        try:
            response = self.lambda_client.add_permission(
                FunctionName=self.function_name,
                Action='lambda:InvokeFunction',
                Principal='s3.amazonaws.com',
                StatementId='s3-trigger-permission',  # A unique statement ID
                SourceArn=f"arn:aws:s3:::{bucket_name}"
            )
            print_out = json.dumps(json.loads(response['Statement']), indent=4)
            print(f"Permission added with Statement: {print_out}")

        except Exception as e:
            print(f"Error adding Lambda permission: {e}")
            return

        # Add bucket notification to trigger the Lambda function
        lambda_arn = self.lambda_client.get_function(
            FunctionName=self.function_name
        )['Configuration']['FunctionArn']
                
        notification_configuration = {
            'LambdaFunctionConfigurations': [
                {
                    'LambdaFunctionArn': lambda_arn,
                    'Events': ['s3:ObjectCreated:*'],
                    'Filter': {
                        'Key': {
                            'FilterRules': [
                                {
                                    'Name': 'suffix',
                                    'Value': self.filter_rules_suffix
                                }
                            ]
                        }
                    }
                }
            ]
        }
        
        try:
            self.s3_client.put_bucket_notification_configuration(
                Bucket=bucket_name,
                NotificationConfiguration=notification_configuration
            )
            print(f"Trigger added for {bucket_name} -> {self.function_name}")

        except Exception as e:
            print(f"Error setting S3 notification: {e}")

S3_Helper.py

py 复制代码

import boto3
import json
import os

class S3_Helper: 

    def __init__(self):
        
        # Get the account ID being used
        sts_client = boto3.client('sts')
        response = sts_client.get_caller_identity()
        account_id = response['Account']
        
        # Create a Boto3 client for the S3 service      
        self.s3_client = boto3.client('s3', region_name='us-west-2')

    def list_objects(self, bucket_name):
        try:
            # List objects within the bucket
            response = self.s3_client.list_objects_v2(Bucket=bucket_name)

            # Check if the bucket has any objects
            if 'Contents' in response:
                for obj in response['Contents']:
                    key = obj['Key']
                    creation_time = obj['LastModified']
                    print(f"Object: {key}, Created on: {creation_time}")
            else:
                print(f"No objects found in the bucket: {bucket_name}")
    
        except Exception as e:
            print(f"Error: {str(e)}")

    def upload_file(self, bucket_name, file_name):
        try:
            # Upload file to an S3 object from the specified local path
            self.s3_client.upload_file(file_name, bucket_name, file_name)
            print(f"Object '{file_name}' uploaded to bucket '{bucket_name}'")
        except Exception as e:
            print(f"Error: {str(e)}")
            
    def download_object(self, bucket_name, object_key):
        try:
            # Download the object from S3 to the specified local path
            self.s3_client.download_file(bucket_name, object_key, f"./{object_key}")
            print(f"Object '{object_key}' from bucket '{bucket_name}' to './{object_key}'")
        except Exception as e:
            print(f"Error: {str(e)}")

py 复制代码

import boto3, os
from helpers.Lambda_Helper import Lambda_Helper
from helpers.S3_Helper import S3_Helper
from helpers.Display_Helper import Display_Helper

lambda_helper = Lambda_Helper()
# deploy_function
# add_lambda_trigger

s3_helper = S3_Helper()
# upload_file
# download_object 
# list_objects

display_helper = Display_Helper()
# text_file
# json_file

bucket_name_text = os.environ['LEARNERS3BUCKETNAMETEXT']

py 复制代码

%%writefile prompt_template.txt
I need to summarize a conversation. The transcript of the conversation is between the <data> XML like tags.

<data>
{{transcript}}
</data>

The summary must contain a one word sentiment analysis, and a list of issues, problems or causes of friction
during the conversation. The output must be provided in JSON format shown in the following example. 

Example output:
{
    "version": 0.1,
    "sentiment": <sentiment>,
    "issues": [
        {
            "topic": <topic>,
            "summary": <issue_summary>,
        }
    ]
}

An `issue_summary` must only be one of:
{%- for topic in topics %}
 - `{{topic}}`
{% endfor %}

Write the JSON output and nothing more.

Here is the JSON output:

Create the Lambda function

py 复制代码

%%writefile lambda_function.py


#############################################################
#
# This Lambda function is written to a file by the notebook 
# It does not run in the notebook!
#
#############################################################

import boto3
import json 
from jinja2 import Template

s3_client = boto3.client('s3')
bedrock_runtime = boto3.client('bedrock-runtime', 'us-west-2')

def lambda_handler(event, context):
    
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    
    # One of a few different checks to ensure we don't end up in a recursive loop.
    if "-transcript.json" not in key: 
        print("This demo only works with *-transcript.json.")
        return
    
    try: 
        file_content = ""
        
        response = s3_client.get_object(Bucket=bucket, Key=key)
        
        file_content = response['Body'].read().decode('utf-8')
        
        transcript = extract_transcript_from_textract(file_content)

        print(f"Successfully read file {key} from bucket {bucket}.")

        print(f"Transcript: {transcript}")
        
        summary = bedrock_summarisation(transcript)
        
        s3_client.put_object(
            Bucket=bucket,
            Key='results.txt',
            Body=summary,
            ContentType='text/plain'
        )
        
    except Exception as e:
        print(f"Error occurred: {e}")
        return {
            'statusCode': 500,
            'body': json.dumps(f"Error occurred: {e}")
        }

    return {
        'statusCode': 200,
        'body': json.dumps(f"Successfully summarized {key} from bucket {bucket}. Summary: {summary}")
    }
        
        
        
def extract_transcript_from_textract(file_content):

    transcript_json = json.loads(file_content)

    output_text = ""
    current_speaker = None

    items = transcript_json['results']['items']

    # Iterate through the content word by word:
    for item in items:
        speaker_label = item.get('speaker_label', None)
        content = item['alternatives'][0]['content']
        
        # Start the line with the speaker label:
        if speaker_label is not None and speaker_label != current_speaker:
            current_speaker = speaker_label
            output_text += f"\n{current_speaker}: "
        
        # Add the speech content:
        if item['type'] == 'punctuation':
            output_text = output_text.rstrip()  # Remove the last space
        
        output_text += f"{content} "
        
    return output_text
        

def bedrock_summarisation(transcript):
    
    with open('prompt_template.txt', "r") as file:
        template_string = file.read()

    data = {
        'transcript': transcript,
        'topics': ['charges', 'location', 'availability']
    }
    
    template = Template(template_string)
    prompt = template.render(data)
    
    print(prompt)
    
    kwargs = {
        "modelId": "amazon.titan-text-express-v1",
        "contentType": "application/json",
        "accept": "*/*",
        "body": json.dumps(
            {
                "inputText": prompt,
                "textGenerationConfig": {
                    "maxTokenCount": 2048,
                    "stopSequences": [],
                    "temperature": 0,
                    "topP": 0.9
                }
            }
        )
    }
    
    response = bedrock_runtime.invoke_model(**kwargs)

    summary = json.loads(response.get('body').read()).get('results')[0].get('outputText')    
    return summary

py 复制代码

lambda_helper.deploy_function(
    ["lambda_function.py", "prompt_template.txt"],
    function_name="LambdaFunctionSummarize"
)

Output

py 复制代码

Zipping function...
Looking for existing function...
Function LambdaFunctionSummarize exists. Updating code...
Function LambdaFunctionSummarize code updated: 2024-06-06T06:45:47.000+0000
Done.

py 复制代码

lambda_helper.filter_rules_suffix = "json"
lambda_helper.add_lambda_trigger(bucket_name_text)

Output

json 复制代码

Using function name of deployed function: LambdaFunctionSummarize
Removed existing permission: s3-trigger-permission
Permission added with Statement: {
    "Sid": "s3-trigger-permission",
    "Effect": "Allow",
    "Principal": {
        "Service": "s3.amazonaws.com"
    },
    "Action": "lambda:InvokeFunction",
    "Resource": "arn:aws:lambda:us-west-2:880695732601:function:LambdaFunctionSummarize",
    "Condition": {
        "ArnLike": {
            "AWS:SourceArn": "arn:aws:s3:::c99355a2566046l6746328t1w394701795-learners3bucket-gjbuopo2mr67"
        }
    }
}
Trigger added for c99355a2566046l6746328t1w394701795-learners3bucket-gjbuopo2mr67 -> LambdaFunctionSummarize

py 复制代码

s3_helper.upload_file(bucket_name_text, 'demo-transcript.json')
s3_helper.download_object(bucket_name_text, "results.txt")
display_helper.text_file('results.txt')

Output

Lesson 5: Event-driven generation

py 复制代码

import boto3, os

from helpers.Lambda_Helper import Lambda_Helper
from helpers.S3_Helper import S3_Helper

lambda_helper = Lambda_Helper()
s3_helper = S3_Helper()

bucket_name_text = os.environ['LEARNERS3BUCKETNAMETEXT']
bucket_name_audio = os.environ['LEARNERS3BUCKETNAMEAUDIO']

Deploy your lambda function

py 复制代码

%%writefile lambda_function.py

#############################################################
#
# This Lambda function is written to a file by the notebook 
# It does not run in the notebook!
#
#############################################################

import json
import boto3
import uuid
import os

s3_client = boto3.client('s3')
transcribe_client = boto3.client('transcribe', region_name='us-west-2')

def lambda_handler(event, context):
    # Extract the bucket name and key from the incoming event
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    # One of a few different checks to ensure we don't end up in a recursive loop.
    if key != "dialog.mp3": 
        print("This demo only works with dialog.mp3.")
        return

    try:
        
        job_name = 'transcription-job-' + str(uuid.uuid4()) # Needs to be a unique name

        response = transcribe_client.start_transcription_job(
            TranscriptionJobName=job_name,
            Media={'MediaFileUri': f's3://{bucket}/{key}'},
            MediaFormat='mp3',
            LanguageCode='en-US',
            OutputBucketName= os.environ['S3BUCKETNAMETEXT'],  # specify the output bucket
            OutputKey=f'{job_name}-transcript.json',
            Settings={
                'ShowSpeakerLabels': True,
                'MaxSpeakerLabels': 2
            }
        )
        
    except Exception as e:
        print(f"Error occurred: {e}")
        return {
            'statusCode': 500,
            'body': json.dumps(f"Error occurred: {e}")
        }

    return {
        'statusCode': 200,
        'body': json.dumps(f"Submitted transcription job for {key} from bucket {bucket}.")
    }

Output

py 复制代码

Writing lambda_function.py

py 复制代码

lambda_helper.lambda_environ_variables = {'S3BUCKETNAMETEXT' : bucket_name_text}
lambda_helper.deploy_function(["lambda_function.py"], function_name="LambdaFunctionTranscribe")

Output

py 复制代码

Zipping function...
Looking for existing function...
Function LambdaFunctionTranscribe does not exist. Creating...
Function LambdaFunctionTranscribe created: arn:aws:lambda:us-west-2:169601639897:function:LambdaFunctionTranscribe
Done.

py 复制代码

lambda_helper.filter_rules_suffix = "mp3"
lambda_helper.add_lambda_trigger(bucket_name_audio, function_name="LambdaFunctionTranscribe")

Output

json 复制代码

Permission added with Statement: {
    "Sid": "s3-trigger-permission",
    "Effect": "Allow",
    "Principal": {
        "Service": "s3.amazonaws.com"
    },
    "Action": "lambda:InvokeFunction",
    "Resource": "arn:aws:lambda:us-west-2:169601639897:function:LambdaFunctionTranscribe",
    "Condition": {
        "ArnLike": {
            "AWS:SourceArn": "arn:aws:s3:::c99355a2566048l6746330t1w5617-learneraudios3bucket-yn6dipq6a3lu"
        }
    }
}
Trigger added for c99355a2566048l6746330t1w5617-learneraudios3bucket-yn6dipq6a3lu -> LambdaFunctionTranscribe

py 复制代码

s3_helper.upload_file(bucket_name_audio, 'dialog.mp3')

Output

py 复制代码

Object 'dialog.mp3' uploaded to bucket 'c99355a2566048l6746330t1w5617-learneraudios3bucket-yn6dipq6a3lu'

py 复制代码

s3_helper.list_objects(bucket_name_audio)

Output

py 复制代码

Object: dialog.mp3, Created on: 2024-06-06 07:15:08+00:00

py 复制代码

s3_helper.list_objects(bucket_name_text)

Output

py 复制代码

Object: .write_access_check_file.temp, Created on: 2024-06-06 07:15:10+00:00
Object: results.txt, Created on: 2024-06-06 07:15:36+00:00
Object: transcription-job-dab704ca-7fa6-4200-961e-16d4b60e6212-transcript.json, Created on: 2024-06-06 07:15:30+00:00

py 复制代码

s3_helper.download_object(bucket_name_text, 'results.txt')

Output

py 复制代码

Object 'results.txt' from bucket 'c99355a2566048l6746330t1w56172-learnertexts3bucket-dccaxvbtixou' to './results.txt'

py 复制代码

from helpers.Display_Helper import Display_Helper
display_helper = Display_Helper()
display_helper.text_file('results.txt')

Output

json 复制代码

results.txt:

{
    "version": 0.1,
    "sentiment": "positive",
    "issues": [
        {
            "topic": "charges",
            "summary": "pre authorization of $1000 is excessive"
        }
    ]
}

Extra resources:

后记

2024年6月6日15点24分完成这门short course。基于Amazon Bedrock使用大模型以及event-driven serverless workflow using AWS Lambda。