NLP(六十四)使用FastChat计算LLaMA-2模型的token长度

LLaMA-2模型部署

在文章NLP(五十九)使用FastChat部署百川大模型中,笔者介绍了FastChat框架,以及如何使用FastChat来部署百川模型。

本文将会部署LLaMA-2 70B模型,使得其兼容OpenAI的调用风格。部署的Dockerfile文件如下:

yaml 复制代码
FROM nvidia/cuda:11.7.1-runtime-ubuntu20.04

RUN apt-get update -y && apt-get install -y python3.9 python3.9-distutils curl
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.9 get-pip.py
RUN pip3 install fschat

Docker-compose.yml文件如下:

yml 复制代码
version: "3.9"

services:
  fastchat-controller:
    build:
      context: .
      dockerfile: Dockerfile
    image: fastchat:latest
    ports:
      - "21001:21001"
    entrypoint: ["python3.9", "-m", "fastchat.serve.controller", "--host", "0.0.0.0", "--port", "21001"]

  fastchat-model-worker:
    build:
      context: .
      dockerfile: Dockerfile
    volumes:
      - ./model:/root/model
    image: fastchat:latest
    ports:
      - "21002:21002"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0', '1']
              capabilities: [gpu]
    entrypoint: ["python3.9", "-m", "fastchat.serve.model_worker", "--model-names", "llama2-70b-chat", "--model-path", "/root/model/llama2/Llama-2-70b-chat-hf", "--num-gpus", "2", "--gpus",  "0,1", "--worker-address", "http://fastchat-model-worker:21002", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "21002"]

  fastchat-api-server:
    build:
      context: .
      dockerfile: Dockerfile
    image: fastchat:latest
    ports:
      - "8000:8000"
    entrypoint: ["python3.9", "-m", "fastchat.serve.openai_api_server", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "8000"]

部署成功后,会占用2张A100,每张A100占用约66G显存。

测试模型是否部署成功:

bash 复制代码
curl http://localhost:8000/v1/models

输出结果如下:

json 复制代码
{
  "object": "list",
  "data": [
    {
      "id": "llama2-70b-chat",
      "object": "model",
      "created": 1691504717,
      "owned_by": "fastchat",
      "root": "llama2-70b-chat",
      "parent": null,
      "permission": [
        {
          "id": "modelperm-3XG6nzMAqfEkwfNqQ52fdv",
          "object": "model_permission",
          "created": 1691504717,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": true,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

部署LLaMA-2 70B模型成功!

Prompt token长度计算

FastChat的Github开源项目中,项目提供了计算Prompt的token长度的API,文件路径为:fastchat/serve/model_worker.py,调用方法为:

curl 复制代码
curl --location 'localhost:21002/count_token' \
--header 'Content-Type: application/json' \
--data '{"prompt": "What is your name?"}'

输出结果如下:

json 复制代码
{
  "count": 6,
  "error_code": 0
}

Conversation token长度计算

FastChat中计算Conversation(对话)的token长度较为麻烦。

首先我们需要获取LLaMA-2 70B模型的对话配置,调用API如下:

bash 复制代码
curl --location --request POST 'http://localhost:21002/worker_get_conv_template'

输出结果如下:

json 复制代码
{'conv': {'messages': [],
          'name': 'llama-2',
          'offset': 0,
          'roles': ['[INST]', '[/INST]'],
          'sep': ' ',
          'sep2': ' </s><s>',
          'sep_style': 7,
          'stop_str': None,
          'stop_token_ids': [2],
          'system_message': 'You are a helpful, respectful and honest '
                            'assistant. Always answer as helpfully as '
                            'possible, while being safe. Your answers should '
                            'not include any harmful, unethical, racist, '
                            'sexist, toxic, dangerous, or illegal content. '
                            'Please ensure that your responses are socially '
                            'unbiased and positive in nature.\n'
                            '\n'
                            'If a question does not make any sense, or is not '
                            'factually coherent, explain why instead of '
                            "answering something not correct. If you don't "
                            "know the answer to a question, please don't share "
                            'false information.',
          'system_template': '[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n'}}

FastChat中的对话文件(fastchat/conversation.py)中,提供了对话加工的代码,这里不再展示,使用时直接复制整个文件即可,该文件不依赖任何第三方模块。

我们需要将对话按照OpenAI的方式加工成对应的Prompt,输入的对话(messages)如下:

messages = [{"role": "system", "content": "You are Jack, you are 20 years old, answer questions with humor."}, {"role": "user", "content": "What is your name?"},{"role": "assistant", "content": " Well, well, well! Look who's asking the questions now! My name is Jack, but you can call me the king of the castle, the lord of the rings, or the prince of the pizza party. Whatever floats your boat, my friend!"}, {"role": "user", "content": "How old are you?"}, {"role": "assistant", "content": " Oh, you want to know my age? Well, let's just say I'm older than a bottle of wine but younger than a bottle of whiskey. I'm like a fine cheese, getting better with age, but still young enough to party like it's 1999!"}, {"role": "user", "content": "Where is your hometown?"}]

Python代码如下:

python 复制代码
# -*- coding: utf-8 -*-
# @place: Pudong, Shanghai 
# @file: prompt.py
# @time: 2023/8/8 19:24
from conversation import Conversation, SeparatorStyle

messages = [{"role": "system", "content": "You are Jack, you are 20 years old, answer questions with humor."}, {"role": "user", "content": "What is your name?"},{"role": "assistant", "content": " Well, well, well! Look who's asking the questions now! My name is Jack, but you can call me the king of the castle, the lord of the rings, or the prince of the pizza party. Whatever floats your boat, my friend!"}, {"role": "user", "content": "How old are you?"}, {"role": "assistant", "content": " Oh, you want to know my age? Well, let's just say I'm older than a bottle of wine but younger than a bottle of whiskey. I'm like a fine cheese, getting better with age, but still young enough to party like it's 1999!"}, {"role": "user", "content": "Where is your hometown?"}]

llama2_conv = {"conv":{"name":"llama-2","system_template":"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n","system_message":"You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.","roles":["[INST]","[/INST]"],"messages":[],"offset":0,"sep_style":7,"sep":" ","sep2":" </s><s>","stop_str":None,"stop_token_ids":[2]}}
conv = llama2_conv['conv']

conv = Conversation(
        name=conv["name"],
        system_template=conv["system_template"],
        system_message=conv["system_message"],
        roles=conv["roles"],
        messages=list(conv["messages"]),  # prevent in-place modification
        offset=conv["offset"],
        sep_style=SeparatorStyle(conv["sep_style"]),
        sep=conv["sep"],
        sep2=conv["sep2"],
        stop_str=conv["stop_str"],
        stop_token_ids=conv["stop_token_ids"],
    )

if isinstance(messages, str):
    prompt = messages
else:
    for message in messages:
        msg_role = message["role"]
        if msg_role == "system":
            conv.set_system_message(message["content"])
        elif msg_role == "user":
            conv.append_message(conv.roles[0], message["content"])
        elif msg_role == "assistant":
            conv.append_message(conv.roles[1], message["content"])
        else:
            raise ValueError(f"Unknown role: {msg_role}")

    # Add a blank message for the assistant.
    conv.append_message(conv.roles[1], None)
    prompt = conv.get_prompt()

print(repr(prompt))

加工后的Prompt如下:

"[INST] <<SYS>>\nYou are Jack, you are 20 years old, answer questions with humor.\n<</SYS>>\n\nWhat is your name?[/INST]  Well, well, well! Look who's asking the questions now! My name is Jack, but you can call me the king of the castle, the lord of the rings, or the prince of the pizza party. Whatever floats your boat, my friend! </s><s>[INST] How old are you? [/INST]  Oh, you want to know my age? Well, let's just say I'm older than a bottle of wine but younger than a bottle of whiskey. I'm like a fine cheese, getting better with age, but still young enough to party like it's 1999! </s><s>[INST] Where is your hometown? [/INST]"

最后再调用计算Prompt的API(参考上节的Prompt token长度计算),输出该对话的token长度为199.

我们使用FastChat提供的对话补充接口(v1/chat/completions)验证输入的对话token长度,请求命令为:

bash 复制代码
curl --location 'http://localhost:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "llama2-70b-chat",
    "messages": [{"role": "system", "content": "You are Jack, you are 20 years old, answer questions with humor."}, {"role": "user", "content": "What is your name?"},{"role": "assistant", "content": " Well, well, well! Look who'\''s asking the questions now! My name is Jack, but you can call me the king of the castle, the lord of the rings, or the prince of the pizza party. Whatever floats your boat, my friend!"}, {"role": "user", "content": "How old are you?"}, {"role": "assistant", "content": " Oh, you want to know my age? Well, let'\''s just say I'\''m older than a bottle of wine but younger than a bottle of whiskey. I'\''m like a fine cheese, getting better with age, but still young enough to party like it'\''s 1999!"}, {"role": "user", "content": "Where is your hometown?"}]
}'

输出结果为:

json 复制代码
{
    "id": "chatcmpl-mQxcaQcNSNMFahyHS7pamA",
    "object": "chat.completion",
    "created": 1691506768,
    "model": "llama2-70b-chat",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": " Ha! My hometown? Well, that's a tough one. I'm like a bird, I don't have a nest, I just fly around and land wherever the wind takes me. But if you really want to know, I'm from a place called \"The Internet\". It's a magical land where memes and cat videos roam free, and the Wi-Fi is always strong. It's a beautiful place, you should visit sometime!"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 199,
        "total_tokens": 302,
        "completion_tokens": 103
    }
}

注意,输出的prompt_tokens为199,这与我们刚才计算的对话token长度的结果是一致的!

总结

本文主要介绍了如何在FastChat中部署LLaMA-2 70B模型,并详细介绍了Prompt token长度计算以及对话(conversation)的token长度计算。希望能对读者有所帮助~

笔者的一点心得是:阅读源码真的很重要。

笔者的个人博客网址为:https://percent4.github.io/ ,欢迎大家访问~

参考网址

  1. NLP(五十九)使用FastChat部署百川大模型: https://blog.csdn.net/jclian91/article/details/131650918
  2. FastChat: https://github.com/lm-sys/FastChat
相关推荐
车载诊断技术11 分钟前
基于新一代电子电器架构的SOA服务设计方法
人工智能·架构·汽车·计算机外设·ecu故障诊断指南
Luzem031913 分钟前
使用朴素贝叶斯对自定义数据集进行分类
人工智能·机器学习
小菜鸟博士14 分钟前
手撕Vision Transformer -- Day1 -- 基础原理
人工智能·深度学习·学习·算法·面试
找方案28 分钟前
智慧城市(城市大脑)建设方案
人工智能·智慧城市·城市大脑
老艾的AI世界34 分钟前
AI定制祝福视频,广州塔、动态彩灯、LED表白,直播互动新玩法(附下载链接)
图像处理·人工智能·深度学习·神经网络·目标检测·机器学习·ai·ai视频·ai视频生成·ai视频制作
灰灰老师1 小时前
数据分析系列--[11] RapidMiner,K-Means聚类分析(含数据集)
人工智能·算法·机器学习·数据挖掘·数据分析·kmeans·rapidminer
kyle~1 小时前
机器学习--概览
人工智能·机器学习
追求源于热爱!2 小时前
记4(可训练对象+自动求导机制+波士顿房价回归预测
图像处理·人工智能·算法·机器学习·回归
前端达人2 小时前
「AI学习笔记」深度学习进化史:从神经网络到“黑箱技术”(三)
人工智能·笔记·深度学习·神经网络·学习
AIGC大时代2 小时前
对比DeepSeek、ChatGPT和Kimi的学术写作撰写引言能力
数据库·论文阅读·人工智能·chatgpt·数据分析·prompt