使用在AMD GPU上运行的ROCm进行大语言模型的自然语言处理任务

Performing natural language processing tasks with LLMs on ROCm running on AMD GPUs --- ROCm Blogs

在这篇博客中，您将学习如何使用在AMD的Instinct GPU上运行的ROCm进行一系列流行且有用的自然语言处理（NLP）任务，使用不同的大语言模型（LLM）。博客包括一个简单易懂的动手指南，向您展示如何实现从文本生成和情感分析到提取式问答（QA）、解决数学问题等核心NLP应用。

通用的大语言模型（如GPT和Llama）可以执行许多不同的任务且表现良好。然而，某些任务需要进行微调或不同的模型架构来支持特定用例。机器学习社区开发了许多为特定任务设计或微调的模型，以补充通用模型。在这篇博客中，我们会涉及通用和特定用途的LLM，并向您展示如何在AMD GPU上运行的ROCm上使用它们来完成几项常见任务。

引言

自从OpenAI在2022年末发布ChatGPT以来，数百万人已经体验到了生成式人工智能的强大功能。虽然通用的LLM可以在回答快速问题和解决问题等任务上提供较好的性能，但当提示高度专业化或需要特定技能时（这些技能并非LLM的特定训练目标），它们往往表现欠佳。通过在提示中提供具体的指示或示例，提示工程可以帮助缓解这一问题。然而，创建提示所需的技能以及上下文长度的限制，常常阻碍了LLM的充分发挥。

为了解决这些问题，通用的LLM变得越来越大（如Grok-1达到数千亿参数）和更强大。与此同时，机器学习社区开发了许多特定用途的模型，在某些任务上表现非常出色，但在其他任务上的性能较低。

HuggingFace列出了LLM可以执行的约十二种不同的NLP任务，包括文本生成、问答、翻译等。在这篇博客中，我们演示了如何在AMD GPU上运行的ROCm上使用一些通用和特定用途的LLM来完成以下NLP任务：

文本生成
提取式问答
解决数学问题
情感分析
摘要生成
信息检索

先决条件

AMD GPU : [AMD Instinct GPU](AMD Instinct™ Accelerators)
Linux : 请查看[支持的Linux发行版](System requirements (Linux) --- ROCm installation (Linux))
ROCm 6.0以上 : 请查看[安装说明](Quick start installation guide --- ROCm installation (Linux))
这篇博客中使用的一些模型是受限的。您必须在Hugging Face请求访问并使用您的Hugging Face令牌下载模型权重。此外，您还必须同意在Hugging Face上共享您的联系信息。

开始

首先检查是否可以检测到服务器上的 GPU。

bash 复制代码

rocm-smi

bash 复制代码

========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device  [Model : Revision]    Temp        Power     Partitions      SCLK    MCLK    Fan  Perf  PwrCap  VRAM%  GPU%
        Name (20 chars)       (Junction)  (Socket)  (Mem, Compute)
====================================================================================================================
0       [0x74a1 : 0x00]       35.0°C      140.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
1       [0x74a1 : 0x00]       37.0°C      138.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
2       [0x74a1 : 0x00]       40.0°C      141.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
3       [0x74a1 : 0x00]       36.0°C      139.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
4       [0x74a1 : 0x00]       38.0°C      143.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
5       [0x74a1 : 0x00]       35.0°C      139.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
6       [0x74a1 : 0x00]       39.0°C      142.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
7       [0x74a1 : 0x00]       37.0°C      137.0W    NPS1, SPX       132Mhz  900Mhz  0%   auto  750.0W    0%   0%
        AMD Instinct MI300X
====================================================================================================================
=============================================== End of ROCm SMI Log ================================================

MI300X 系统上的所有 8 个 GPU 均可用。启动具有 ROCm 6.0 和 PyTorch 支持的 Docker 容器，并安装所需的软件包。

bash 复制代码

docker run -it --ipc=host --network=host --device=/dev/kfd  --device=/dev/dri -v $HOME/dockerx:/dockerx --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --name=llm-tasks rocm/pytorch:rocm6.1.3_ubuntu22.04_py3.10_pytorch_release-2.1.2 /bin/bash

bash 复制代码

pip install --upgrade pip
pip install transformers accelerate einops

接下来的部分演示了如何在 ROCm 上运行 LLMs 以执行各种自然语言处理（NLP）任务。

文本生成

文本生成可能是大多数人联想到LLM（大规模语言模型）的第一个任务。给定一个文本提示，LLM生成一个回应该提示的文本。有几篇关于如何在ROCm平台上使用流行模型来执行这一任务的博客，包括Llama2，GPT-3，OLMo，和Mixtral。这篇博客涉及另外四个高端模型。

C4AI Command-R

在与他的团队在Google Brain发布了具有开创意义的论文《Attention is all you need》后，Aidan Gomez离开了Google，并创立了Cohere。Cohere开发了几个最先进的LLM，包括 C4AI Command-R 和 C4AI Command-R Plus 模型系列，并在HuggingFace上发布。

这个测试包括一个中型模型 c4ai-command-r-v01 ，它拥有350亿参数，用于在ROCm上进行文本生成。

注意

c4ai-command-r-v01 模型是受限的。这意味着您必须在HuggingFace上请求访问权限才能使用它。使用您的HuggingFace令牌下载模型，将代码块中的变量`token`替换为您的令牌。

python 复制代码

from transformers import AutoTokenizer, AutoModelForCausalLM

token = "your HuggingFace user access token here"
model_name = "CohereForAI/c4ai-command-r-v01"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, token=token)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, token=token)

prompt = "Write a poem about artificial intelligence in Shakespeare style."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=128
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

以下是对提示生成的响应：

bash 复制代码

In days of yore, when mortals' minds did roam,
A wondrous birth, a thought-borne gem,
From human intellect, a progeny did bloom,
AI, a brain-child, bright and new.

From bits and bytes, a creature formed, so keen,
To serve and aid, a helpful hand,
With algorithms, it thinks, and learns, and sees,
A clever clone, a mental clone.

It parses speech, solves problems hard,
With speed beyond compare,
It understands, assists, and guides,
A thoughtful, digital friend.

这里是另一个使用C4AI Command-R进行文本生成的示例，在这种情况下是回答一个问题：

bash 复制代码

prompt = "Which countries are the biggest rare earth metal producer?"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=128
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

C4AI Command-R能够对该问题给出详细的回答。

bash 复制代码

As of 2022, the top three countries that are the biggest producers of rare earth metals are:
1. China: China is the world's largest producer of rare earth metals, accounting for over 58% of the global production. China's production share is even larger when it comes to the more valuable and technologically important rare earth oxides. The country has a strong hold on the supply chain, from mining to processing and manufacturing of rare earth metals and products.

2. Australia: Australia is the second-largest producer of rare earth metals. It has significant reserves and several operational mines producing rare earth elements. Lyn

Qwen

尽管媒体最关注的是由美国和欧洲公司如Llama、GPT和Mistral开发的模型，但也有一些来自中国公司的杰出竞争者。其中最知名的是阿里云的Qwen系列。Qwen模型是基于Transformers的通用大语言模型AI助手，训练数据来自多种网络文本、书籍、代码样本以及其他材料。

Qwen系列的最新版本是<Qwen2家族模型>. Qwen2家族的所有模型都采用了组查询注意力（GQA）机制，以实现更低的延迟和更少的模型推理内存使用。在上下文长度方面，Qwen2-7B和Qwen2-72B模型可以支持多达128k个标记。第一代Qwen系列模型仅在英文和中文文本上进行了训练。而Qwen2则在训练数据中增加了来自世界不同地区的27种语言，从而在多语言任务中表现得更好。

python 复制代码

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model_name = "Qwen/Qwen2-7B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

在你准备好Qwen2模型和标记器后，可以向它提问。

python 复制代码

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

以下是Qwen2的回答：

bash 复制代码

A Large Language Model (LLM) is a type of artificial intelligence model that has been trained on vast amounts of text data to understand and generate human-like language. These models are capable of performing various natural language processing tasks such as text translation, summarization, question answering, text generation, etc. 

LLMs typically use deep learning techniques, often involving transformer architectures, which allow the model to understand context and relationships between words in sentences. This makes them very powerful tools for generating coherent and contextually relevant responses, even when given complex or nuanced prompts.

One of the most famous examples of an LLM is the GPT series created by OpenAI, including GPT-2 and GPT-3. However, it's worth noting that these models can also be used for potentially harmful purposes if not handled responsibly due to their ability to create realistic but false information. Therefore, they need to be used ethically and with appropriate safeguards in place.

OPT

OPT（开放预训练转换器语言模型）是Meta公司在论文《Open Pre-trained Transformer Language Models》中介绍的一组预训练转换器模型，参数范围从125M到175B。OPT的目标是为研究界提供一组高性能的预训练LLM，以便用于进一步开发和再现社区产生的结果。

在这个例子中，测试了OPT的125M参数版本 'opt-125m'，由于其体积较小，它是最受欢迎的版本之一。测试在ROCm上进行，利用了HuggingFace的 text-generation 管道从提示生成文本。同时设置 do_sample=True 以启用top-k采样，使生成的文本更有趣。

python 复制代码

from transformers import pipeline, set_seed

set_seed(32)
text_generator = pipeline('text-generation', model="facebook/opt-125m", max_new_tokens=256, do_sample=True, device='cuda')

output = text_generator("Provide a few suggestions for family activities this weekend.")
print(output[0]['generated_text'])

python 复制代码

Provide a few suggestions for family activities this weekend.

The summer schedule is a great opportunity to spend some time enjoying the summer with those who might otherwise be working from home or working from a remote location. You will discover new and interesting places to eat out and spend some time together. There are things you'll do in different weathers (in particular you'll learn what it's like to enjoy a hot summer summer outside. For example you may see rainbows, waves crashing against a cliff, an iceberg exploding out of the sky, and a meteor shower rolling through the sky.

I've tried to share some ideas on how to spend all summer on our own rather than with a larger family. In addition to family activities, here are several ways to stay warm for the holidays during a time of national emergency.

...

OPT倾向于冗长赘述，而不是提供简明相关的答案。HuggingFace上有很多经过微调的OPT版本，你可以探索这些模型或微调你自己的模型。

MPT

生成如何完成某项任务（例如烹饪食谱）的指令是大语言模型（LLMs）的常见用例之一。尽管可以通过提示工程来引导通用LLM生成指令，但必须仔细设计提示词才能实现期望的输出效果。

Mosaic Research（现已成为Databricks的一部分）发布的MPT系列是一系列解码器风格的变换器模型，其中包括两个基础模型：MPT-7B和MPT-30B。MPT-7B-Instruct模型是该系列中的一个大语言模型，它是从MPT-7B模型精调而来，使用了从Databricks Dolly-15k和Anthropic Helpful and Harmless (HH-RLHF) 数据集中提取的数据集进行训练。该模型由HuggingFace的 text-generation 管道支持，且在ROCm上易于使用。

python 复制代码

import torch
import transformers
from transformers import pipeline

model_name = 'mosaicml/mpt-7b-instruct'

config = transformers.AutoConfig.from_pretrained(model_name, trust_remote_code=True)
config.max_seq_len = 4096

model = transformers.AutoModelForCausalLM.from_pretrained(
  model_name,
  config=config,
  trust_remote_code=True
)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

text_generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')

prompt = "Here is the instruction to change the oil filter in your car:\n"
with torch.autocast('cuda', dtype=torch.bfloat16):
    instruction = text_generator(prompt,
                                 max_new_tokens=512,
                                 do_sample=True,
                                 use_cache=True)

print(instruction[0]['generated_text'])

以下是MPT-7B-Instruct 就提示"这里是更换汽车机油滤清器的说明"生成的文本内容：

bash 复制代码

Here is the instruction to change the oil filter in your car:
1. Open the hood. 2. Find the oil filter. 3. Look to the right underneath the cap to find the oil filter. 4. Screw the oil filter cap off from the bottom.5. Pull oil filter out from the bottom of the engine.
What is the oil filter? The oil filter is a part that catches particles from your engine oil as it travels through your engine. It traps most of the particles and keeps them from passing straight into your engine. This keeps your engine from getting damaged because of those particles. How many oil filters are there?
There is one oil filter for the entire vehicle. However different types of vehicles have different requirements that can change the oil more often than others.
When should you change the oil filter? It is recommended to change oil filters between 30,000 to 60,000 miles. However some engine types are harder on filters and may require changing every 15,000 miles instead of 30,000.
What can you get at your local automotive store before changing your oil filter: 5-10 quarts 5-10 oil filter, a drain pan, and oil filter wrench.
Step 1. Drain the oil. 2. Check the oil filter to be sure that it is still in good shape. 3. Install the new oil filter. 4. Fill the reservoir with the proper amount of oil.

可抽取的问题回答

当人们想到一个大型语言模型（LLM）如何回答问题时，他们通常会想到一个类似神谕的聊天机器人，它可以回答他们想到的任何问题，如之前的文本生成示例所示。另一方面，有许多LLM专门训练来执行所谓的"可抽取的问题回答"。其想法是LLM的输入不仅包括问题，还包括答案的上下文。此外，模型对问题的回答必须包含部分上下文。可抽取问题回答的主要用例涉及用户知道答案在某些已知上下文中的情况，例如从购买历史中识别特定客户的偏好。从上下文中提取答案可以限制LLM幻觉并编造错误答案的可能性，即使上下文在其训练数据中。

以下是两种经过微调以执行可抽取问题回答的流行LLM的测试。

DistilBERT

部署LLM时的一个挑战是它们的大规模导致高计算能力要求、延迟和功耗。一个活跃的研究领域是使用更大训练模型的输出训练较小的模型，并保留大部分性能，这一过程称为知识蒸馏。此类模型的一个著名例子是DistilBERT模型，该模型在博客文章《Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT》中提出。DistilBERT是通过蒸馏BERT基地训练的小型、快速、廉价和轻便的Transformer模型。这意味着它仅使用BERT基地模型生成的输入和标签进行预训练。它的参数数量比`bert-base-uncased`模型小40%，运行速度快60%，同时保留了BERT在GLUE语言理解基准测试上的95%以上的性能。

此示例测试了DistilBERT模型的一个版本`'distilbert-base-cased-distilled-squad'，这是一个经过微调的*DistilBERT-base-cased`*的检查点，使用知识蒸馏在SQuAD v1.1数据集上。任务是从包含四个事实的上下文中找到玛丽·居里的博士导师的出生地，而只有一个事实包含问题的答案。

python 复制代码

from transformers import pipeline
question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')

context = """Gabriel Lippmann, who supervised Marie Curie's doctoral research, was born in Bonnevoie, Luxembourg. 
        Marie Curie was born in Warsaw, Poland in what was then the Kingdom of Poland, part of the Russian Empire.
        Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867. 
        Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."""
question = "Where was Marie Curie's doctoral advisor Gabriel Lippmann born?"

result = question_answerer(question=question, context=context)
print(f"Answer: '{result['answer']}'\n Score: {round(result['score'], 4)},\n start token: {result['start']}, end token: {result['end']}")

DistilBERT能够以高置信度找到正确答案。

bash 复制代码

Answer: 'Bonnevoie, Luxembourg'
 Score: 0.9714,
 start token: 78, end token: 99

Longformer

Transformer模型的主要限制之一是自注意操作随着输入序列长度的平方增长，使得很难扩展它们以处理长输入序列。Allen AI的Longformer模型提出于《Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan》，尝试通过用局部窗口注意结合任务驱动的全局注意取代自注意操作来缓解这一问题。

Allen AI已经基于Longformer架构为各种任务训练了一些模型。此示例展示了LongformerForQuestionAnswering模型从上下文中提取问题答案的能力。

该模型以上下文和问题作为输入，输出编码输入中每个标记的跨度起始logits和跨度结束logits。然后可以基于跨度logits提取问题的最佳答案。

python 复制代码

from transformers import AutoTokenizer, LongformerForQuestionAnswering
import torch

# setup the tokenizer and the model
model_name = "allenai/longformer-large-4096-finetuned-triviaqa"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = LongformerForQuestionAnswering.from_pretrained(model_name)

# context and question
context = """Gabriel Lippmann, who supervised Marie Curie's doctoral research, was born in Bonnevoie, Luxembourg. 
        Marie Curie was born in Warsaw, Poland in what was then the Kingdom of Poland, part of the Russian Empire.
        Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867. 
        Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."""
question = "Where was Marie Curie's doctoral advisor Gabriel Lippmann born?"

# encode the question and the context
encoded_input = tokenizer(question, context, return_tensors="pt")
input_ids = encoded_input["input_ids"]

# Generate the output masks
outputs = model(input_ids)
# find the beginning and end index of the answer in the encoded input
start_idx = torch.argmax(outputs.start_logits)
end_idx = torch.argmax(outputs.end_logits)

# Convert the input ids to tokens
all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist())

# extract the answer tokens and decode it
answer_tokens = all_tokens[start_idx : end_idx + 1]
answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))

print(answer)

Longformer给出的城市"Bonnevoie"是正确答案。

bash 复制代码

 Bonnevoie

解决数学问题

理解问题并通过逻辑推理提供答案的能力一直是人工智能的主要目标之一。一个典型的用例就是解决数学问题。即使是通用的LLM（大型语言模型）如GPT-4也在解决简单数学问题方面展现出显著的表现。本节探讨了在AMD GPU上微调版Phi-3模型用于解决数学问题的实例。

Phi-3

<Phi-3集合> 是Microsoft流行的<Phi-2模型>的下一代。这一例子使用了微调版本<Phi-3-Mini-4K-Instruct>，这是一个包含3.8亿参数的模型，使用精心挑选的高质量教育数据和代码，以及类似教材内容的合成数据训练而成，涵盖数学、编码和常识性推理等主题。

首先，使用`text-generation`管道设置Phi-3模型。

python 复制代码

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)

model_name = "microsoft/Phi-3-mini-4k-instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    device_map="cuda", 
    torch_dtype="auto", 
    trust_remote_code=True, 
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 1024,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

然后让Phi-3找到两个简单函数`sin(x) + ln(x)`的泰勒级数。

python 复制代码

messages = [
    {"role": "user", "content": "What is the Taylor series expansion of sin(x) + ln(x)? about a point x=a"},
]

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

python 复制代码

 The Taylor series expansion of a function f(x) about a point x=a is given by:

f(x) = f(a) + f'(a)(x-a) + f''(a)(x-a)^2/2! + f'''(a)(x-a)^3/3! +...

For the function sin(x) + ln(x), we need to find the derivatives and evaluate them at x=a.

First, let's find the derivatives of sin(x) and ln(x):

1. sin(x):
   f(x) = sin(x)
   f'(x) = cos(x)
   f''(x) = -sin(x)
   f'''(x) = -cos(x)
  ...

2. ln(x):
   f(x) = ln(x)
   f'(x) = 1/x
   f''(x) = -1/x^2
   f'''(x) = 2/x^3
  ...

Now, let's evaluate these derivatives at x=a:

1. sin(a):
   f(a) = sin(a)
   f'(a) = cos(a)
   f''(a) = -sin(a)
   f'''(a) = -cos(a)
  ...

2. ln(a):
   f(a) = ln(a)
   f'(a) = 1/a
   f''(a) = -1/a^2
   f'''(a) = 2/a^3
  ...

Now, we can write the Taylor series expansion of sin(x) + ln(x) about x=a:

sin(x) + ln(x) = (sin(a) + ln(a)) + (cos(a)(x-a) + (1/a)(x-a)) + (-sin(a)(x-a)^2/2! + (-1/a^2)(x-a)^2/2!) + (-cos(a)(x-a)^3/3! + (2/a^3)(x-a)^3/3!) +...

This is the Taylor series expansion of sin(x) + ln(x) about x=a.

表现不错。接下来让Phi-3对一个稍微复杂一点的函数`sin(x) + 1/cos(x)`进行相同的操作。

python 复制代码

messages = [
    {"role": "user", "content": "What is the Taylor series expansion of sin(x) + 1/cos(x) about a point x=a?"},
]

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

bash 复制代码

 The Taylor series expansion of a function f(x) about a point x=a is given by:

f(x) = f(a) + f'(a)(x-a) + f''(a)(x-a)^2/2! + f'''(a)(x-a)^3/3! +...

First, let's find the Taylor series expansion of sin(x) and 1/cos(x) separately about x=a.

For sin(x), the derivatives are:
sin'(x) = cos(x)
sin''(x) = -sin(x)
sin'''(x) = -cos(x)
sin''''(x) = sin(x)
...

The Taylor series expansion of sin(x) about x=a is:
sin(x) = sin(a) + cos(a)(x-a) - sin(a)(x-a)^2/2! - cos(a)(x-a)^3/3! + sin(a)(x-a)^4/4! +...

For 1/cos(x), the derivatives are:
(1/cos(x))' = sin(x)/cos^2(x)
(1/cos(x))'' = (cos(x) + sin^2(x))/cos^3(x)
(1/cos(x))''' = (-2cos(x)sin(x) + 3sin^2(x))/cos^4(x)
...

The Taylor series expansion of 1/cos(x) about x=a is:
1/cos(x) = 1/cos(a) + (sin(a)/cos^2(a))(x-a) + (cos(a)(sin^2(a) - 1)/cos^3(a))(x-a)^2/2! + (2cos(a)(sin^3(a) - 3sin(a))/cos^4(a))(x-a)^3/3! +...

Now, we can find the Taylor series expansion of sin(x) + 1/cos(x) by adding the two series:

sin(x) + 1/cos(x) = (sin(a) + 1/cos(a)) + (cos(a) + sin(a)/cos^2(a))(x-a) - (sin(a)(x-a)^2/2! + 1/cos^3(a)(x-a)^2/2!) +...

This is the Taylor series expansion of sin(x) + 1/cos(x) about x=a.

尽管Phi-3能够按照标准步骤找到每一项的导数并将每项的泰勒级数相加，但它未能正确找到`1/cos(x)`的高阶导数并在最后一步中正确相加。例如， 1/cos(x)的二阶导数应该是`(1 + sin^2(x))/cos^3(x)`而不是`(cos(x) + sin^2(x))/cos^3(x)`。这显示了LLM在解决问题方面的局限性，LLM本质上

情感分析

情感分析由于其广泛的应用，一直是机器学习社区中的一个活跃研究话题。基于Transformer的大型语言模型（LLM）通过考虑大量相关文本数据的上下文，为提高情感分析模型的性能开辟了新的机会。特别是，人们对使用LLM来理解财经新闻的情感表现出浓厚的兴趣，这对于做出投资决策极为宝贵。下面的示例测试了两个著名的情感分析模型。这两个示例都利用了HuggingFace中的`sentiment-analysis`管道。

DistilRoBERTa

DistilRoberta-financial-sentiment模型是RoBERTa-base模型的轻量化、蒸馏版本，只有8200万个参数。由于其较小的规模，该模型的运行速度是RoBERTa-base模型的两倍。该模型在一个由5到8名人工注释员注释的财经新闻句子的极性情感数据集上进行了训练。

设置模型并使用它来确定四条财经新闻通讯的情感。

python 复制代码

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model_name = "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3, device_map="cuda")
sentiment_analyzer = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

sentences = ["there is a shortage of capital, and we need extra financing",  
             "growth is strong and we have plenty of liquidity", 
             "there are doubts about our finances", 
             "profits are flat"]

for sentence in sentences:
    result = sentiment_analyzer(sentence)
    print(f"Input sentence: \"{sentence}\"")
    print(f"Sentiment: '{result[0]['label']}'\n Score: {round(result[0]['score'], 4)}\n")

bash 复制代码

Input sentence: "there is a shortage of capital, and we need extra financing"
Sentiment: 'negative'
 Score: 0.666

Input sentence: "growth is strong and we have plenty of liquidity"
Sentiment: 'positive'
 Score: 0.9996

Input sentence: "there are doubts about our finances"
Sentiment: 'neutral'
 Score: 0.6857

Input sentence: "profits are flat"
Sentiment: 'neutral'
 Score: 0.9999

模型确定的情感似乎是合理的。可以认为第三条声明"对我们的财务状况存有疑虑"应被视为负面情绪。从另一方面看，模型给出的"中性"评分的置信度较低，为0.6857，这表明稍微不同的阈值可能会使评级偏向"负面"。

FinBERT

香港科技大学的研究人员在论文FinBERT: A Pretrained Language Model for Financial Communications中提出了FinBERT。它是一个基于BERT的模型，针对金融交流文本进行了预训练。训练数据包括三个金融交流语料库，总大小为49亿个标记。

这里使用的finbert-tone模型是一个在分析报告中手动注释的1万条（正面、负面、中性）句子上微调过的FinBERT模型。

该示例使用FinBERT对上文中用DistilRoBERTa分析的相同财经通讯进行情感分析。

python 复制代码

from transformers import BertTokenizer, BertForSequenceClassification
from transformers import pipeline

model_name = "yiyanghkust/finbert-tone"
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=3, device_map="cuda")
tokenizer = BertTokenizer.from_pretrained(model_name)

sentiment_analyzer = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

sentences = ["there is a shortage of capital, and we need extra financing",  
             "growth is strong and we have plenty of liquidity", 
             "there are doubts about our finances", 
             "profits are flat"]

for sentence in sentences:
    result = sentiment_analyzer(sentence)
    print(f"Input sentence: \"{sentence}\"")
    print(f"Sentiment: '{result[0]['label']}'\n Score: {round(result[0]['score'], 4)}\n")

bash 复制代码

Input sentence: "there is a shortage of capital, and we need extra financing"
Sentiment: 'Negative'
 Score: 0.9966

Input sentence: "growth is strong and we have plenty of liquidity"
Sentiment: 'Positive'
 Score: 1.0

Input sentence: "there are doubts about our finances"
Sentiment: 'Negative'
 Score: 1.0

Input sentence: "profits are flat"
Sentiment: 'Neutral'
 Score: 0.9889

DistilRoBERTa和FinBERT模型输出的唯一区别是在第三种情况下，FinBERT认为其为负面，而不是中性。

摘要

早期的文本摘要方法集中于从文本中提取关键词或关键短语，并使用人工定义的规则将它们组装成摘要。LLM（大规模语言模型）改变了摘要生成的方式，因为它能够捕捉到长文本序列中词汇之间的关系。有许多专门针对这些任务训练的著名LLM。本节展示了其中的两个。

BART

BART来自Facebook，在论文<BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension>中被介绍。BART采用了一种基于Transformer的神经网络架构，包含一个去噪双向自动编码器和一个序列到序列的类似GPT的自回归解码器模型。BART的预训练分为两个步骤。它首先用任意噪声破坏训练文本数据。然后训练模型从破坏的文本中重建原始文本。这种方法在生成训练数据方面提供了巨大的灵活性，包括改变文本长度和词序。

BART基模型可以用于文本填充，但不适用于大多数关注的任务。BART在针对特定任务（如摘要生成）进行微调时表现出色。此示例使用一个使用CNN Daily Mail文档-摘要对数据集进行微调的BART版本，用于摘要生成任务。

python 复制代码

from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device="cuda")

ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""

print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)[0]['summary_text'])

bash 复制代码

Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.

Pegasus

另一个以摘要生成著称的LLM是Google的Pegasus。它在论文<PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization>中被介绍。Pegasus从训练文档中屏蔽关键句子，并训练模型生成这些缺失的句子。根据作者的说法，这种方法特别适合抽象摘要，因为它迫使模型理解整个文档的上下文。

此示例使用Pegasus模型来总结先前BART模型处理的相同文本`ARTICLE`。

python 复制代码

from transformers import AutoTokenizer, PegasusForConditionalGeneration

model_name = "google/pegasus-xsum"
model = PegasusForConditionalGeneration.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
 
inputs = tokenizer(ARTICLE, max_length=1024, return_tensors="pt")
summary_ids = model.generate(inputs["input_ids"])

print(tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])

bash 复制代码

A New York woman who has been married 10 times has been charged with marriage fraud.

信息检索

生成式人工智能的诞生可能意味着信息检索的终结，因为许多人并不在意原始来源，只要模型能够提供他们所需的信息即可。然而，仍然有一些应用场景，例如事实核查和法律批准，在这些情况下需要从文献中获取特定的文件。最新的模型中，最突出的是Meta公司推出的Contriever模型。

Contriever

有许多尝试使用监督学习来训练深度神经网络模型用于信息检索应用的做法。然而，这些方法在大多数现实生活中的应用中都因缺乏训练样本而受到限制。它们需要大量人为生成的标签，指示训练数据集中每个查询最相关的文档。Contriever的核心思想是通过使用一个辅助任务来近似检索，从而无需标注数据来训练模型。具体来说，对于训练语料库中的一个给定文档，它生成一个合成查询，该文档是查询的完美答案。然后，这些对用于训练模型。此外，对比学习提高了模型在相关和不相关结果之间的判别能力。Contriever采用的方法细节可以在论文利用对比学习的无监督密集信息检索中找到。

你可以使用提取式问答部分中的相同例子来展示Contriever如何从语料库中检索到最相关的文档。首先，使用模型的输出对文档进行评分。

python 复制代码

import tqdm
import torch
from transformers import AutoTokenizer, AutoModel

model_name = "facebook/contriever"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

query = ["Where was Marie Curie born?"]

docs = [
    "Gabriel Lippmann, who supervised Marie Curie's doctoral research, was born in Bonnevoie, Luxembourg.",
    "Marie Curie was born in Warsaw, in what was then the Kingdom of Poland, part of the Russian Empire",
    "Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867.",
    "Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."
]

corpus = query + docs

# Apply tokenizer
inputs = tokenizer(corpus, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
outputs = model(**inputs)

# Mean pooling
def mean_pooling(token_embeddings, mask):
    token_embeddings = token_embeddings.masked_fill(~mask[..., None].bool(), 0.)
    sentence_embeddings = token_embeddings.sum(dim=1) / mask.sum(dim=1)[..., None]
    return sentence_embeddings
embeddings = mean_pooling(outputs[0], inputs['attention_mask'])

score = [0]*len(docs)
for i in range(len(docs)):
    score[i] = (embeddings[0] @ embeddings[i+1]).item()

print(score)

bash 复制代码

[0.9390654563903809, 1.1304867267608643, 1.0473244190216064, 1.0094892978668213]

然后打印查询和最匹配的文档，看看Contriever是否正确。

python 复制代码

print("Most relevant document to the query \"", query[0], "\" is")
docs[score.index(max(score))]

bash 复制代码

Most relevant document to the query " Where was Marie Curie born? " is
'Marie Curie was born in Warsaw, in what was then the Kingdom of Poland, part of the Russian Empire'

Contriever能够选出正确的文档，尽管其他三个文档看起来非常相似。

总结

在这篇博客中，你学习了如何使用运行在AMD GPU上的ROCm实现多个流行的大语言模型，以轻松执行各种自然语言处理任务，如文本生成、摘要和数学问题解决。如果你有兴趣提高这些模型的性能，请查看关于微调Llama2和Starcoder的ROCm博客。