问题介绍
在Linux下启动vllm:
bash
python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --model /model/Baichuan2-7B-Chat --trust-remote-code --gpu-memory-utilization 0.80
使用下面的命令测试出错:
bash
curl -X 'POST' \
'http://127.0.0.1:8000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "/model/Baichuan2-7B-Chat",
"messages": [
{
"role": "system",
"content": "你是我的小助理"
},
{
"role": "user",
"content": "告诉我你是谁"
}
],
"max_tokens": 512
}'
返回的信息为:
bash
{
"object": "error",
"message": "Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating",
"type": "BadRequestError",
"param": null,
"code": 400
}
问题分析
上面的返回信息可知,是没有指定chat template引起的。
从那里获取chat template的内容呢?我是从https://github.com/vllm-project/vllm/blob/main/examples/template_llava.jinja获取的,测试了下可以用。
其内容如下:
json
{%- if messages[0]['role'] == 'system' -%}
{%- set system_message = messages[0]['content'] -%}
{%- set messages = messages[1:] -%}
{%- else -%}
{% set system_message = '' -%}
{%- endif -%}
{{ bos_token + system_message }}
{%- for message in messages -%}
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{%- endif -%}
{%- if message['role'] == 'user' -%}
{{ 'USER: ' + message['content'] + '\n' }}
{%- elif message['role'] == 'assistant' -%}
{{ 'ASSISTANT: ' + message['content'] + eos_token + '\n' }}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
{{ 'ASSISTANT:' }}
{% endif %}
解决方法有三种,下面一一介绍。
解决问题
方案1:在模型的tokenizer_config.json中增加一个chat_template字段
bash
{
.....
#老的内容不动,在文件中新增一个chat_template
"chat_template":"{%- if messages[0]['role'] == 'system' -%} {%- set system_message = messages[0]['content'] -%} {%- set messages = messages[1:] -%}{%- else -%} {% set system_message = '' -%}{%- endif -%}{{ bos_token + system_message }}{%- for message in messages -%} {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {%- endif -%} {%- if message['role'] == 'user' -%} {{ 'USER: ' + message['content'] + '\n' }} {%- elif message['role'] == 'assistant' -%} {{ 'ASSISTANT: ' + message['content'] + eos_token + '\n' }} {%- endif -%}{%- endfor -%}{%- if add_generation_prompt -%} {{ 'ASSISTANT:' }} {% endif %}"
}
方案2:在启动vllm时指定chat_template的所有内容(--chat_template)
bash
python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --model /model/Baichuan2-7B-Chat --trust-remote-code --gpu-memory-utilization 0.9 --chat_template "{%- if messages[0]['role'] == 'system' -%} {%- set system_message = messages[0]['content'] -%} {%- set messages = messages[1:] -%}{%- else -%} {% set system_message = '' -%}{%- endif -%}{{ bos_token + system_message }}{%- for message in messages -%} {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%} {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }} {%- endif -%} {%- if message['role'] == 'user' -%} {{ 'USER: ' + message['content'] + '\n' }} {%- elif message['role'] == 'assistant' -%} {{ 'ASSISTANT: ' + message['content'] + eos_token + '\n' }} {%- endif -%}{%- endfor -%}{%- if add_generation_prompt -%} {{ 'ASSISTANT:' }} {% endif %}"
方案3:在启动vllm时指定chat_template的所在文件(--chat_template)
bash
python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --model /model/Baichuan2-7B-Chat --trust-remote-code --gpu-memory-utilization 0.9 --chat_template ./template_llava.jinja
测试
测试命令
bash
curl -X 'POST' \
'http://127.0.0.1:8000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "/model/Baichuan2-7B-Chat",
"messages": [
{
"role": "system",
"content": "你是我的小助理"
},
{
"role": "user",
"content": "告诉我你是谁"
}
],
"max_tokens": 512
}'
则返回
bash
{"id":"chat-15c280f5f54e4128abaeec95daf32e39","object":"chat.completion","created":1728906010,"model":"/model/Baichuan2-7B-Chat","choices":[{"index":0,"message":{"role":"assistant","content":"我是一个聊天机器人,USER,可以帮助你解决问题、提供建议、回答问题等。请随时向我提问,我会尽力帮助你。","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":15,"total_tokens":41,"completion_tokens":26}}
参考资料
https://github.com/vllm-project/vllm/blob/main/examples/template_llava.jinja