SCNet 双DCU异构卡vllm推理部署DeepSeek-Coder-V2-Lite-Instruct

前面已经测试过了qwen 32b和Qwen3-30B-A3B两个模型的VLLM推理部署,并在Auto-Coder中进行试用。现在开始在SCNet的DCU环境中,尝试vllm推理部署DeepSeek-Coder-V2-Lite-Instruct

DeepSeek-Coder-V2-Lite-Instruct是DeepSeek推出的开源代码大模型,采用混合专家(MoE)架构,总参数量16B,激活参数仅2.4B,在保持高性能的同时大幅降低计算成本。该模型支持338种编程语言,具备128K超长上下文处理能力,在HumanEval等代码生成基准测试中表现优异,性能比肩GPT-4 Turbo。

VLLM推理DeepSeek-Coder-V2-Lite-Instruct

在SCNet中国超算中心,开DCU双卡的资源,每个卡是64G显存。

在模型库:模型库找到DeepSeek-Coder-V2-Lite-Instruct,克隆至控制台,然后拿到模型路径

复制代码
/public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct

推理

双卡推理,单卡会爆显存,或者只能达到42k的token,因为低于64k的token几乎无法使用,所以最终使用了双卡。

复制代码
vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/ --trust-remote-code  --tensor-parallel-size 2

端口映射

服务启动后,会在8000端口侦听:

将8000端口映射出去:

映射地址是:

复制代码
# 映射地址
https://c-1998971694380531714.ksai.scnet.cn:58043/
# api调用地址
https://c-1998971694380531714.ksai.scnet.cn:58043/v1/
# 查看模型列表地址
https://c-1998971694380531714.ksai.scnet.cn:58043/v1/models

看看模型列表:

复制代码
https://c-1998971694380531714.ksai.scnet.cn:58043/v1/models

对应模型DeepSeek-Coder-V2-Lite-Instruct

复制代码
{"object":"list","data":[{"id":"DeepSeek-Coder-V2-Lite-Instruct","object":"model","created":1765713036,"owned_by":"vllm","root":"/public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/","parent":null,"max_model_len":163840,"permission":[{"id":"modelperm-a96c25f9e21b4bd48b1e9e72949e6f02","object":"model_permission","created":1765713036,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}

使用CherryStudio测试api调用,

测试成功!

记录一下模型调用参数:

调用base_url:https://c-1998971694380531714.ksai.scnet.cn:58043/v1/

token_key:hello

模型名字:DeepSeek-Coder-V2-Lite-Instruct

在Auto-Coder中调用写一个项目

帮我做一个类似kotti这个web框架的项目,使用前后端分离,后端用fastapi,前端选当前最流行的前端。

项目包括全面的测试。

吞吐速度感觉要比今年的模型慢,比如比qwen32b和Qwen3-Coder-30B-A3B-Instruct的那个慢一些。

复制代码
============================ System Management Interface =============================
======================================================================================
DCU     Temp     AvgPwr     Perf     PwrCap     VRAM%      DCU%      Mode
0       52.0C    141.0W     manual   300.0W     83%        3.3%      Normal
1       51.0C    124.0W     manual   300.0W     83%        3.3%      Normal
======================================================================================

dcu使用率有点低啊,显存占用基本正常。

输出都是类似这样的

复制代码
 <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["dbf096f2"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
 I have saved the message ID for deletion. Please let me know if you need further assistance.conversation tokens: 54185 (conversation round: 294)
 <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["8263d2b9", "2835c199"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 54382 (conversation round: 296)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 54562 (conversation round: 298)
 <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["6c6ce3f8", "8b2831bc"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 54758 (conversation round: 300)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 54937 (conversation round: 302)
 <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["ead44139", "fed64d9b"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 55131 (conversation round: 304)
 I understand that the conversation is too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 55306 (conversation round: 306)
 <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["99555a8b", "6ef99e0c"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 55503 (conversation round: 308)
 <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["7334cd1b", "7c34b013"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 55698 (conversation round: 310)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 55877 (conversation round: 312)
 I understand that the conversation is too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56051 (conversation round: 314)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56230 (conversation round: 316)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56408 (conversation round: 318)
 I understand that the conversation is too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56582 (conversation round: 320)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56761 (conversation round: 322)
 <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["4c020591", "f3f958fe"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 56957 (conversation round: 324)
 <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["63b2a2fc"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
 I have saved the message ID for deletion. Please let me know if you need further assistance.conversation tokens: 57147 (conversation round: 326)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 57328 (conversation round: 328)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 57510 (conversation round: 330)
 I understand that the conversation is still too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 57683 (conversation round: 332)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 57862 (conversation round: 334)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 58042 (conversation round: 336)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 58220 (conversation round: 338)
 <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["149a50d8", "90ee905b"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 58414 (conversation round: 340)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 58594 (conversation round: 342)
 <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["e2632a12", "d217c5d9"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 58792 (conversation round: 344)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 58969 (conversation round: 346)
 I understand that the conversation is too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 59144 (conversation round: 348)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 59322 (conversation round: 350)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 59500 (conversation round: 352)
 <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["fd2488a8", "6966ec28"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 59695 (conversation round: 354)
 <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>conversation_message_ids_write
```json
{"message_ids": ["8c12499c"]}
```<|tool▁call▁end|><|tool▁calls▁end|>
<|tool▁outputs▁begin|><|tool▁output▁begin|>{"status": "success", "message": "Message IDs have been saved for deletion."}<|tool▁output▁end|><|tool▁outputs▁end|>
 I have saved the message ID for deletion. Please let me know if you need further assistance.conversation tokens: 59885 (conversation round: 356)

大约两个都小时候,发现一个文件也没有创建,所以看来,这个模型还不够Auto-Coder的智力需求。

用transformers推理

因为刚开始vllm推理失败,所以尝试用transformers推理。

先安装modelscope

复制代码
pip install modelscope 

推理

复制代码
from modelscope import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "/public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

总结

毕竟DeepSeek-Coder-V2-Lite-Instruct是上一代的模型,所以尽管它写代码的能力很强,但还是无法在Auto-Coder使用。也就是目前测试下来,Auto-Coder下Qwen3-30B-A3B可以使用,qwen 32b和DeepSeek-Coder-V2-Lite-Instruct都无法完成编程任务。

另外我修改了/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py 这个文件,加了K500SM_AI识别的这一句才能不报错:

复制代码
    elif "K500SM_AI" in device_name:
        # return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_default.json"
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"

不管用default.json还是K100AI.json,都是可以的,关键需要有K500SM_AI

调试

报错ValueError: Unsurpport device name: K500SM_AI

复制代码
INFO 12-13 23:14:52 [parallel_state.py:959] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0
(VllmWorkerProcess pid=2716) INFO 12-13 23:14:52 [parallel_state.py:959] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1
INFO 12-13 23:14:52 [model_runner.py:1118] Starting to load model /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/...
(VllmWorkerProcess pid=2716) INFO 12-13 23:14:52 [model_runner.py:1118] Starting to load model /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/...
ERROR 12-13 23:14:52 [engine.py:448] Unsurpport device name: K500SM_AI
ERROR 12-13 23:14:52 [engine.py:448] Traceback (most recent call last):
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine
ERROR 12-13 23:14:52 [engine.py:448]     engine = MQLLMEngine.from_vllm_config(
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config
ERROR 12-13 23:14:52 [engine.py:448]     return cls(
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.engine = LLMEngine(*args, **kwargs)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 283, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 286, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     super().__init__(*args, **kwargs)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self._init_executor()
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor
ERROR 12-13 23:14:52 [engine.py:448]     self._run_workers("load_model",
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
ERROR 12-13 23:14:52 [engine.py:448]     driver_worker_output = run_method(self.driver_worker, sent_method,
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2506, in run_method
ERROR 12-13 23:14:52 [engine.py:448]     return func(*args, **kwargs)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 183, in load_model
ERROR 12-13 23:14:52 [engine.py:448]     self.model_runner.load_model()
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1121, in load_model
ERROR 12-13 23:14:52 [engine.py:448]     self.model = get_model(vllm_config=self.vllm_config)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
ERROR 12-13 23:14:52 [engine.py:448]     return loader.load_model(vllm_config=vllm_config)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 454, in load_model
ERROR 12-13 23:14:52 [engine.py:448]     model = _initialize_model(vllm_config=vllm_config)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
ERROR 12-13 23:14:52 [engine.py:448]     return model_class(vllm_config=vllm_config, prefix=prefix)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 706, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.model = DeepseekV2Model(vllm_config=vllm_config,
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/compilation/decorators.py", line 151, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 635, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.start_layer, self.end_layer, self.layers = make_layers(
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 609, in make_layers
ERROR 12-13 23:14:52 [engine.py:448]     [PPMissingLayer() for _ in range(start_layer)] + [
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 610, in <listcomp>
ERROR 12-13 23:14:52 [engine.py:448]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 637, in <lambda>
ERROR 12-13 23:14:52 [engine.py:448]     lambda prefix: DeepseekV2DecoderLayer(
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 525, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.self_attn = attn_cls(
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 457, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.mla_attn = Attention(
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/attention/layer.py", line 130, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_mla.py", line 70, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.attn_configs = get_attention_mla_configs_json(self.num_heads, 1, self.kv_lora_rank + self.qk_rope_head_dim, self.kv_lora_rank, "fp16")
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 54, in get_attention_mla_configs_json
ERROR 12-13 23:14:52 [engine.py:448]     json_file_name = get_mla_config_file_name(QH, KVH, QKD, VD, cache_dtype)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 47, in get_mla_config_file_name
ERROR 12-13 23:14:52 [engine.py:448]     raise ValueError(f"Unsurpport device name: {device_name}")
ERROR 12-13 23:14:52 [engine.py:448] ValueError: Unsurpport device name: K500SM_AI
Process SpawnProcess-1:
ERROR 12-13 23:14:52 [multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 2716 died, exit code: -15
INFO 12-13 23:14:52 [multiproc_worker_utils.py:124] Killing local vLLM worker processes
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 450, in run_mp_engine
    raise e
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine
    engine = MQLLMEngine.from_vllm_config(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config
    return cls(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__
    self.engine = LLMEngine(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 283, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 286, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor
    self._run_workers("load_model",
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
    driver_worker_output = run_method(self.driver_worker, sent_method,
  File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2506, in run_method
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 183, in load_model
    self.model_runner.load_model()
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1121, in load_model
    self.model = get_model(vllm_config=self.vllm_config)
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
    return loader.load_model(vllm_config=vllm_config)
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 454, in load_model
    model = _initialize_model(vllm_config=vllm_config)
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
    return model_class(vllm_config=vllm_config, prefix=prefix)
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 706, in __init__
    self.model = DeepseekV2Model(vllm_config=vllm_config,
  File "/usr/local/lib/python3.10/dist-packages/vllm/compilation/decorators.py", line 151, in __init__
    old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 635, in __init__
    self.start_layer, self.end_layer, self.layers = make_layers(
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 609, in make_layers
    [PPMissingLayer() for _ in range(start_layer)] + [
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 610, in <listcomp>
    maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 637, in <lambda>
    lambda prefix: DeepseekV2DecoderLayer(
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 525, in __init__
    self.self_attn = attn_cls(
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 457, in __init__
    self.mla_attn = Attention(
  File "/usr/local/lib/python3.10/dist-packages/vllm/attention/layer.py", line 130, in __init__
    self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
  File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_mla.py", line 70, in __init__
    self.attn_configs = get_attention_mla_configs_json(self.num_heads, 1, self.kv_lora_rank + self.qk_rope_head_dim, self.kv_lora_rank, "fp16")
  File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 54, in get_attention_mla_configs_json
    json_file_name = get_mla_config_file_name(QH, KVH, QKD, VD, cache_dtype)
  File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 47, in get_mla_config_file_name
    raise ValueError(f"Unsurpport device name: {device_name}")
ValueError: Unsurpport device name: K500SM_AI
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/cli/main.py", line 51, in main
    args.dispatch_function(args)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/cli/serve.py", line 27, in cmd
    uvloop.run(run_server(args))
  File "/usr/local/lib/python3.10/dist-packages/uvloop/__init__.py", line 82, in run
    return loop.run_until_complete(wrapper())
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.10/dist-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 1069, in run_server
    async with build_async_engine_client(args) as engine_client:
  File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 269, in build_async_engine_client_from_engine_args
    raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.
root@notebook-1998971694380531714-ac7sc1ejvp-96619:~# /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

尝试

复制代码
vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/ --served-model-name  DeepSeek-Coder-V2-Lite-Instruct --trust-remote-code  --tensor-parallel-size 2  --device cuda

不行,加上这句试试

复制代码
export VLLM_DISABLE_TRITON=1

不行

加上

复制代码
--enable-reasoning --reasoning-parser deepseek_r1 

不行

ai建议

复制代码
# 假设你已经装好 vllm、torch (CPU or ROCm) 等
export VLLM_DISABLE_TRITON=1   # 或者直接使用 --disable_mla
vllm serve \
    /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct \
    --host 0.0.0.0 \
    --port 8000 \
    --device auto \
    --tensor-parallel-size 1 \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.9 \
    --trust-remote-code

修改源代码

File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 47, in get_mla_config_file_name

raise ValueError(f"Unsurpport device name: {device_name}")

ValueError: Unsurpport device name: K500SM_AI

修改这个文件/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py,加入elif "K500SM_AI"这段:

复制代码
def get_mla_config_file_name(QH: int, KVH: int, QKD: int, VD: int, cache_dtype: Optional[str]) -> str:
    if cache_dtype == "default":
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_default.json"
    
    device_name = torch.cuda.get_device_name().replace(" ", "_")
    if "K100_AI" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"
    elif "BW" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_BW.json"
    elif "K500SM_AI" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"
    else:
        raise ValueError(f"Unsurpport device name: {device_name}")

问题解决。后来发现用default也是可以的:

复制代码
    elif "K500SM_AI" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_default.json"
        # return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"

现在出现新的报错:

ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks.

复制代码
(VllmWorkerProcess pid=2680) INFO 12-14 00:28:14 [loader.py:460] Loading weights took 94.00 seconds
INFO 12-14 00:28:14 [loader.py:460] Loading weights took 94.02 seconds
(VllmWorkerProcess pid=2680) INFO 12-14 00:28:14 [model_runner.py:1154] Model loading took 15.2695 GiB and 94.673958 seconds
INFO 12-14 00:28:14 [model_runner.py:1154] Model loading took 15.2695 GiB and 94.688603 seconds
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks.
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] Traceback (most recent call last):
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238]     output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238]   File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2506, in run_method
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238]     return func(*args, **kwargs)
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

怀疑是机器的双卡坏了,明天换个机器看看。

继续测试,这回把那句话改成:

复制代码
    device_name = torch.cuda.get_device_name().replace(" ", "_")
    if "K100_AI" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"
    elif "BW" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_BW.json"
    elif "K500SM_AI" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_default.json"

执行

复制代码
vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/ --served-model-name  DeepSeek-Coder-V2-Lite-Instruct --trust-remote-code  --tensor-parallel-size 2  

这样可以看到它的配置变成:

(VllmWorkerProcess pid=2833) WARNING 12-14 19:47:22 [fused_moe.py:959] Using default MoE config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=64,N=704,device_name=K500SM_AI.json

好了,启动了

端口映射出来:

复制代码
https://c-1998971694380531714.ksai.scnet.cn:58043
相关推荐
NAGNIP2 小时前
一文搞懂深度学习中的通用逼近定理!
人工智能·算法·面试
冬奇Lab3 小时前
一天一个开源项目(第36篇):EverMemOS - 跨 LLM 与平台的长时记忆 OS,让 Agent 会记忆更会推理
人工智能·开源·资讯
冬奇Lab3 小时前
OpenClaw 源码深度解析(一):Gateway——为什么需要一个"中枢"
人工智能·开源·源码阅读
AngelPP7 小时前
OpenClaw 架构深度解析:如何把 AI 助手搬到你的个人设备上
人工智能
宅小年7 小时前
Claude Code 换成了Kimi K2.5后,我再也回不去了
人工智能·ai编程·claude
九狼7 小时前
Flutter URL Scheme 跨平台跳转
人工智能·flutter·github
ZFSS8 小时前
Kimi Chat Completion API 申请及使用
前端·人工智能
天翼云开发者社区9 小时前
春节复工福利就位!天翼云息壤2500万Tokens免费送,全品类大模型一键畅玩!
人工智能·算力服务·息壤
知识浅谈9 小时前
教你如何用 Gemini 将课本图片一键转为精美 PPT
人工智能
Ray Liang9 小时前
被低估的量化版模型,小身材也能干大事
人工智能·ai·ai助手·mindx