前面已经测试过了qwen 32b和Qwen3-30B-A3B两个模型的VLLM推理部署,并在Auto-Coder中进行试用。现在开始在SCNet的DCU环境中,尝试vllm推理部署DeepSeek-Coder-V2-Lite-Instruct
DeepSeek-Coder-V2-Lite-Instruct是DeepSeek推出的开源代码大模型,采用混合专家(MoE)架构,总参数量16B,激活参数仅2.4B,在保持高性能的同时大幅降低计算成本。该模型支持338种编程语言,具备128K超长上下文处理能力,在HumanEval等代码生成基准测试中表现优异,性能比肩GPT-4 Turbo。
VLLM推理DeepSeek-Coder-V2-Lite-Instruct
在SCNet中国超算中心,开DCU双卡的资源,每个卡是64G显存。
在模型库:模型库找到DeepSeek-Coder-V2-Lite-Instruct,克隆至控制台,然后拿到模型路径
/public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct
推理
双卡推理,单卡会爆显存,或者只能达到42k的token,因为低于64k的token几乎无法使用,所以最终使用了双卡。
vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/ --trust-remote-code --tensor-parallel-size 2
端口映射
服务启动后,会在8000端口侦听:
将8000端口映射出去:

映射地址是:
# 映射地址
https://c-1998971694380531714.ksai.scnet.cn:58043/
# api调用地址
https://c-1998971694380531714.ksai.scnet.cn:58043/v1/
# 查看模型列表地址
https://c-1998971694380531714.ksai.scnet.cn:58043/v1/models
看看模型列表:
https://c-1998971694380531714.ksai.scnet.cn:58043/v1/models
对应模型DeepSeek-Coder-V2-Lite-Instruct
{"object":"list","data":[{"id":"DeepSeek-Coder-V2-Lite-Instruct","object":"model","created":1765713036,"owned_by":"vllm","root":"/public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/","parent":null,"max_model_len":163840,"permission":[{"id":"modelperm-a96c25f9e21b4bd48b1e9e72949e6f02","object":"model_permission","created":1765713036,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}
使用CherryStudio测试api调用,
测试成功!
记录一下模型调用参数:
调用base_url:https://c-1998971694380531714.ksai.scnet.cn:58043/v1/
token_key:hello
模型名字:DeepSeek-Coder-V2-Lite-Instruct
在Auto-Coder中调用写一个项目
帮我做一个类似kotti这个web框架的项目,使用前后端分离,后端用fastapi,前端选当前最流行的前端。
项目包括全面的测试。
吞吐速度感觉要比今年的模型慢,比如比qwen32b和Qwen3-Coder-30B-A3B-Instruct的那个慢一些。
============================ System Management Interface =============================
======================================================================================
DCU Temp AvgPwr Perf PwrCap VRAM% DCU% Mode
0 52.0C 141.0W manual 300.0W 83% 3.3% Normal
1 51.0C 124.0W manual 300.0W 83% 3.3% Normal
======================================================================================
dcu使用率有点低啊,显存占用基本正常。
输出都是类似这样的
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["dbf096f2"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
I have saved the message ID for deletion. Please let me know if you need further assistance.conversation tokens: 54185 (conversation round: 294)
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["8263d2b9", "2835c199"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 54382 (conversation round: 296)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 54562 (conversation round: 298)
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["6c6ce3f8", "8b2831bc"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 54758 (conversation round: 300)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 54937 (conversation round: 302)
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["ead44139", "fed64d9b"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 55131 (conversation round: 304)
I understand that the conversation is too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 55306 (conversation round: 306)
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["99555a8b", "6ef99e0c"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 55503 (conversation round: 308)
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["7334cd1b", "7c34b013"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 55698 (conversation round: 310)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 55877 (conversation round: 312)
I understand that the conversation is too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56051 (conversation round: 314)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56230 (conversation round: 316)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56408 (conversation round: 318)
I understand that the conversation is too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56582 (conversation round: 320)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56761 (conversation round: 322)
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["4c020591", "f3f958fe"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 56957 (conversation round: 324)
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["63b2a2fc"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
I have saved the message ID for deletion. Please let me know if you need further assistance.conversation tokens: 57147 (conversation round: 326)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 57328 (conversation round: 328)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 57510 (conversation round: 330)
I understand that the conversation is still too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 57683 (conversation round: 332)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 57862 (conversation round: 334)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 58042 (conversation round: 336)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 58220 (conversation round: 338)
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["149a50d8", "90ee905b"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 58414 (conversation round: 340)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 58594 (conversation round: 342)
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["e2632a12", "d217c5d9"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 58792 (conversation round: 344)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 58969 (conversation round: 346)
I understand that the conversation is too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 59144 (conversation round: 348)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 59322 (conversation round: 350)
I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 59500 (conversation round: 352)
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["fd2488a8", "6966ec28"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 59695 (conversation round: 354)
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["8c12499c"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
I have saved the message ID for deletion. Please let me know if you need further assistance.conversation tokens: 59885 (conversation round: 356)
大约两个都小时候,发现一个文件也没有创建,所以看来,这个模型还不够Auto-Coder的智力需求。
用transformers推理
因为刚开始vllm推理失败,所以尝试用transformers推理。
先安装modelscope
pip install modelscope
推理
from modelscope import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "/public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
总结
毕竟DeepSeek-Coder-V2-Lite-Instruct是上一代的模型,所以尽管它写代码的能力很强,但还是无法在Auto-Coder使用。也就是目前测试下来,Auto-Coder下Qwen3-30B-A3B可以使用,qwen 32b和DeepSeek-Coder-V2-Lite-Instruct都无法完成编程任务。
另外我修改了/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py 这个文件,加了K500SM_AI识别的这一句才能不报错:
elif "K500SM_AI" in device_name:
# return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_default.json"
return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"
不管用default.json还是K100AI.json,都是可以的,关键需要有K500SM_AI
调试
报错ValueError: Unsurpport device name: K500SM_AI
INFO 12-13 23:14:52 [parallel_state.py:959] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0
(VllmWorkerProcess pid=2716) INFO 12-13 23:14:52 [parallel_state.py:959] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1
INFO 12-13 23:14:52 [model_runner.py:1118] Starting to load model /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/...
(VllmWorkerProcess pid=2716) INFO 12-13 23:14:52 [model_runner.py:1118] Starting to load model /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/...
ERROR 12-13 23:14:52 [engine.py:448] Unsurpport device name: K500SM_AI
ERROR 12-13 23:14:52 [engine.py:448] Traceback (most recent call last):
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine
ERROR 12-13 23:14:52 [engine.py:448] engine = MQLLMEngine.from_vllm_config(
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config
ERROR 12-13 23:14:52 [engine.py:448] return cls(
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__
ERROR 12-13 23:14:52 [engine.py:448] self.engine = LLMEngine(*args, **kwargs)
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 283, in __init__
ERROR 12-13 23:14:52 [engine.py:448] self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 286, in __init__
ERROR 12-13 23:14:52 [engine.py:448] super().__init__(*args, **kwargs)
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 12-13 23:14:52 [engine.py:448] self._init_executor()
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor
ERROR 12-13 23:14:52 [engine.py:448] self._run_workers("load_model",
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
ERROR 12-13 23:14:52 [engine.py:448] driver_worker_output = run_method(self.driver_worker, sent_method,
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2506, in run_method
ERROR 12-13 23:14:52 [engine.py:448] return func(*args, **kwargs)
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 183, in load_model
ERROR 12-13 23:14:52 [engine.py:448] self.model_runner.load_model()
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1121, in load_model
ERROR 12-13 23:14:52 [engine.py:448] self.model = get_model(vllm_config=self.vllm_config)
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
ERROR 12-13 23:14:52 [engine.py:448] return loader.load_model(vllm_config=vllm_config)
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 454, in load_model
ERROR 12-13 23:14:52 [engine.py:448] model = _initialize_model(vllm_config=vllm_config)
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
ERROR 12-13 23:14:52 [engine.py:448] return model_class(vllm_config=vllm_config, prefix=prefix)
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 706, in __init__
ERROR 12-13 23:14:52 [engine.py:448] self.model = DeepseekV2Model(vllm_config=vllm_config,
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/compilation/decorators.py", line 151, in __init__
ERROR 12-13 23:14:52 [engine.py:448] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 635, in __init__
ERROR 12-13 23:14:52 [engine.py:448] self.start_layer, self.end_layer, self.layers = make_layers(
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 609, in make_layers
ERROR 12-13 23:14:52 [engine.py:448] [PPMissingLayer() for _ in range(start_layer)] + [
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 610, in <listcomp>
ERROR 12-13 23:14:52 [engine.py:448] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 637, in <lambda>
ERROR 12-13 23:14:52 [engine.py:448] lambda prefix: DeepseekV2DecoderLayer(
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 525, in __init__
ERROR 12-13 23:14:52 [engine.py:448] self.self_attn = attn_cls(
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 457, in __init__
ERROR 12-13 23:14:52 [engine.py:448] self.mla_attn = Attention(
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/attention/layer.py", line 130, in __init__
ERROR 12-13 23:14:52 [engine.py:448] self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_mla.py", line 70, in __init__
ERROR 12-13 23:14:52 [engine.py:448] self.attn_configs = get_attention_mla_configs_json(self.num_heads, 1, self.kv_lora_rank + self.qk_rope_head_dim, self.kv_lora_rank, "fp16")
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 54, in get_attention_mla_configs_json
ERROR 12-13 23:14:52 [engine.py:448] json_file_name = get_mla_config_file_name(QH, KVH, QKD, VD, cache_dtype)
ERROR 12-13 23:14:52 [engine.py:448] File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 47, in get_mla_config_file_name
ERROR 12-13 23:14:52 [engine.py:448] raise ValueError(f"Unsurpport device name: {device_name}")
ERROR 12-13 23:14:52 [engine.py:448] ValueError: Unsurpport device name: K500SM_AI
Process SpawnProcess-1:
ERROR 12-13 23:14:52 [multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 2716 died, exit code: -15
INFO 12-13 23:14:52 [multiproc_worker_utils.py:124] Killing local vLLM worker processes
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 450, in run_mp_engine
raise e
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine
engine = MQLLMEngine.from_vllm_config(
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config
return cls(
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__
self.engine = LLMEngine(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 283, in __init__
self.model_executor = executor_class(vllm_config=vllm_config, )
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 286, in __init__
super().__init__(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
self._init_executor()
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor
self._run_workers("load_model",
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
driver_worker_output = run_method(self.driver_worker, sent_method,
File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2506, in run_method
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 183, in load_model
self.model_runner.load_model()
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1121, in load_model
self.model = get_model(vllm_config=self.vllm_config)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
return loader.load_model(vllm_config=vllm_config)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 454, in load_model
model = _initialize_model(vllm_config=vllm_config)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
return model_class(vllm_config=vllm_config, prefix=prefix)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 706, in __init__
self.model = DeepseekV2Model(vllm_config=vllm_config,
File "/usr/local/lib/python3.10/dist-packages/vllm/compilation/decorators.py", line 151, in __init__
old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 635, in __init__
self.start_layer, self.end_layer, self.layers = make_layers(
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 609, in make_layers
[PPMissingLayer() for _ in range(start_layer)] + [
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 610, in <listcomp>
maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 637, in <lambda>
lambda prefix: DeepseekV2DecoderLayer(
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 525, in __init__
self.self_attn = attn_cls(
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 457, in __init__
self.mla_attn = Attention(
File "/usr/local/lib/python3.10/dist-packages/vllm/attention/layer.py", line 130, in __init__
self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_mla.py", line 70, in __init__
self.attn_configs = get_attention_mla_configs_json(self.num_heads, 1, self.kv_lora_rank + self.qk_rope_head_dim, self.kv_lora_rank, "fp16")
File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 54, in get_attention_mla_configs_json
json_file_name = get_mla_config_file_name(QH, KVH, QKD, VD, cache_dtype)
File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 47, in get_mla_config_file_name
raise ValueError(f"Unsurpport device name: {device_name}")
ValueError: Unsurpport device name: K500SM_AI
Traceback (most recent call last):
File "/usr/local/bin/vllm", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/cli/main.py", line 51, in main
args.dispatch_function(args)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/cli/serve.py", line 27, in cmd
uvloop.run(run_server(args))
File "/usr/local/lib/python3.10/dist-packages/uvloop/__init__.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.10/dist-packages/uvloop/__init__.py", line 61, in wrapper
return await main
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 1069, in run_server
async with build_async_engine_client(args) as engine_client:
File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 269, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.
root@notebook-1998971694380531714-ac7sc1ejvp-96619:~# /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
尝试
vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/ --served-model-name DeepSeek-Coder-V2-Lite-Instruct --trust-remote-code --tensor-parallel-size 2 --device cuda
不行,加上这句试试
export VLLM_DISABLE_TRITON=1
不行
加上
--enable-reasoning --reasoning-parser deepseek_r1
不行
ai建议
# 假设你已经装好 vllm、torch (CPU or ROCm) 等
export VLLM_DISABLE_TRITON=1 # 或者直接使用 --disable_mla
vllm serve \
/public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct \
--host 0.0.0.0 \
--port 8000 \
--device auto \
--tensor-parallel-size 1 \
--max-model-len 8192 \
--gpu-memory-utilization 0.9 \
--trust-remote-code
修改源代码
File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 47, in get_mla_config_file_name
raise ValueError(f"Unsurpport device name: {device_name}")
ValueError: Unsurpport device name: K500SM_AI
修改这个文件/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py,加入elif "K500SM_AI"这段:
def get_mla_config_file_name(QH: int, KVH: int, QKD: int, VD: int, cache_dtype: Optional[str]) -> str:
if cache_dtype == "default":
return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_default.json"
device_name = torch.cuda.get_device_name().replace(" ", "_")
if "K100_AI" in device_name:
return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"
elif "BW" in device_name:
return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_BW.json"
elif "K500SM_AI" in device_name:
return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"
else:
raise ValueError(f"Unsurpport device name: {device_name}")
问题解决。后来发现用default也是可以的:
elif "K500SM_AI" in device_name:
return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_default.json"
# return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"
现在出现新的报错:
ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks.
(VllmWorkerProcess pid=2680) INFO 12-14 00:28:14 [loader.py:460] Loading weights took 94.00 seconds
INFO 12-14 00:28:14 [loader.py:460] Loading weights took 94.02 seconds
(VllmWorkerProcess pid=2680) INFO 12-14 00:28:14 [model_runner.py:1154] Model loading took 15.2695 GiB and 94.673958 seconds
INFO 12-14 00:28:14 [model_runner.py:1154] Model loading took 15.2695 GiB and 94.688603 seconds
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks.
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] Traceback (most recent call last):
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2506, in run_method
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] return func(*args, **kwargs)
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
怀疑是机器的双卡坏了,明天换个机器看看。
继续测试,这回把那句话改成:
device_name = torch.cuda.get_device_name().replace(" ", "_")
if "K100_AI" in device_name:
return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"
elif "BW" in device_name:
return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_BW.json"
elif "K500SM_AI" in device_name:
return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_default.json"
执行
vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/ --served-model-name DeepSeek-Coder-V2-Lite-Instruct --trust-remote-code --tensor-parallel-size 2
这样可以看到它的配置变成:
(VllmWorkerProcess pid=2833) WARNING 12-14 19:47:22 [fused_moe.py:959] Using default MoE config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=64,N=704,device_name=K500SM_AI.json
好了,启动了
端口映射出来:
https://c-1998971694380531714.ksai.scnet.cn:58043