SCNet 双DCU异构卡vllm推理部署DeepSeek-Coder-V2-Lite-Instruct

前面已经测试过了qwen 32b和Qwen3-30B-A3B两个模型的VLLM推理部署，并在Auto-Coder中进行试用。现在开始在SCNet的DCU环境中，尝试vllm推理部署DeepSeek-Coder-V2-Lite-Instruct

DeepSeek-Coder-V2-Lite-Instruct是DeepSeek推出的开源代码大模型，采用混合专家（MoE）架构，总参数量16B，激活参数仅2.4B，在保持高性能的同时大幅降低计算成本。该模型支持338种编程语言，具备128K超长上下文处理能力，在HumanEval等代码生成基准测试中表现优异，性能比肩GPT-4 Turbo。

VLLM推理DeepSeek-Coder-V2-Lite-Instruct

在SCNet中国超算中心，开DCU双卡的资源，每个卡是64G显存。

在模型库：模型库找到DeepSeek-Coder-V2-Lite-Instruct，克隆至控制台，然后拿到模型路径

复制代码

/public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct

推理

双卡推理，单卡会爆显存，或者只能达到42k的token，因为低于64k的token几乎无法使用，所以最终使用了双卡。

复制代码

vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/ --trust-remote-code  --tensor-parallel-size 2

端口映射

服务启动后，会在8000端口侦听：

将8000端口映射出去：

映射地址是：

复制代码

# 映射地址
https://c-1998971694380531714.ksai.scnet.cn:58043/
# api调用地址
https://c-1998971694380531714.ksai.scnet.cn:58043/v1/
# 查看模型列表地址
https://c-1998971694380531714.ksai.scnet.cn:58043/v1/models

看看模型列表：

复制代码

https://c-1998971694380531714.ksai.scnet.cn:58043/v1/models

对应模型DeepSeek-Coder-V2-Lite-Instruct

复制代码

{"object":"list","data":[{"id":"DeepSeek-Coder-V2-Lite-Instruct","object":"model","created":1765713036,"owned_by":"vllm","root":"/public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/","parent":null,"max_model_len":163840,"permission":[{"id":"modelperm-a96c25f9e21b4bd48b1e9e72949e6f02","object":"model_permission","created":1765713036,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}

使用CherryStudio测试api调用，

测试成功！

记录一下模型调用参数：

调用base_url：https://c-1998971694380531714.ksai.scnet.cn:58043/v1/

token_key：hello

模型名字：DeepSeek-Coder-V2-Lite-Instruct

在Auto-Coder中调用写一个项目

帮我做一个类似kotti这个web框架的项目，使用前后端分离，后端用fastapi，前端选当前最流行的前端。

项目包括全面的测试。

吞吐速度感觉要比今年的模型慢，比如比qwen32b和Qwen3-Coder-30B-A3B-Instruct的那个慢一些。

复制代码

============================ System Management Interface =============================
======================================================================================
DCU     Temp     AvgPwr     Perf     PwrCap     VRAM%      DCU%      Mode
0       52.0C    141.0W     manual   300.0W     83%        3.3%      Normal
1       51.0C    124.0W     manual   300.0W     83%        3.3%      Normal
======================================================================================

dcu使用率有点低啊，显存占用基本正常。

输出都是类似这样的

复制代码

 <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>function<｜tool▁sep｜>conversation_message_ids_write
```json
{"message_ids": ["dbf096f2"]}
```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>{"status": "success", "message": "Message IDs have been saved for deletion."}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>
 I have saved the message ID for deletion. Please let me know if you need further assistance.conversation tokens: 54185 (conversation round: 294)
 <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>function<｜tool▁sep｜>conversation_message_ids_write
```json
{"message_ids": ["8263d2b9", "2835c199"]}
```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>{"status": "success", "message": "Message IDs have been saved for deletion."}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 54382 (conversation round: 296)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 54562 (conversation round: 298)
 <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>function<｜tool▁sep｜>conversation_message_ids_write
```json
{"message_ids": ["6c6ce3f8", "8b2831bc"]}
```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>{"status": "success", "message": "Message IDs have been saved for deletion."}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 54758 (conversation round: 300)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 54937 (conversation round: 302)
 <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>function<｜tool▁sep｜>conversation_message_ids_write
```json
{"message_ids": ["ead44139", "fed64d9b"]}
```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>{"status": "success", "message": "Message IDs have been saved for deletion."}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 55131 (conversation round: 304)
 I understand that the conversation is too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 55306 (conversation round: 306)
 <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>function<｜tool▁sep｜>conversation_message_ids_write
```json
{"message_ids": ["99555a8b", "6ef99e0c"]}
```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>{"status": "success", "message": "Message IDs have been saved for deletion."}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 55503 (conversation round: 308)
 <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>function<｜tool▁sep｜>conversation_message_ids_write
```json
{"message_ids": ["7334cd1b", "7c34b013"]}
```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>{"status": "success", "message": "Message IDs have been saved for deletion."}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 55698 (conversation round: 310)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 55877 (conversation round: 312)
 I understand that the conversation is too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56051 (conversation round: 314)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56230 (conversation round: 316)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56408 (conversation round: 318)
 I understand that the conversation is too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56582 (conversation round: 320)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 56761 (conversation round: 322)
 <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>function<｜tool▁sep｜>conversation_message_ids_write
```json
{"message_ids": ["4c020591", "f3f958fe"]}
```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>{"status": "success", "message": "Message IDs have been saved for deletion."}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 56957 (conversation round: 324)
 <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>function<｜tool▁sep｜>conversation_message_ids_write
```json
{"message_ids": ["63b2a2fc"]}
```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>{"status": "success", "message": "Message IDs have been saved for deletion."}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>
 I have saved the message ID for deletion. Please let me know if you need further assistance.conversation tokens: 57147 (conversation round: 326)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 57328 (conversation round: 328)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 57510 (conversation round: 330)
 I understand that the conversation is still too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 57683 (conversation round: 332)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 57862 (conversation round: 334)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 58042 (conversation round: 336)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 58220 (conversation round: 338)
 <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>function<｜tool▁sep｜>conversation_message_ids_write
```json
{"message_ids": ["149a50d8", "90ee905b"]}
```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>{"status": "success", "message": "Message IDs have been saved for deletion."}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 58414 (conversation round: 340)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 58594 (conversation round: 342)
 <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>function<｜tool▁sep｜>conversation_message_ids_write
```json
{"message_ids": ["e2632a12", "d217c5d9"]}
```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>{"status": "success", "message": "Message IDs have been saved for deletion."}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 58792 (conversation round: 344)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 58969 (conversation round: 346)
 I understand that the conversation is too long to list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 59144 (conversation round: 348)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 59322 (conversation round: 350)
 I'm sorry, but due to the length of the conversation, I cannot list all message IDs manually. To save message IDs for deletion, please use the `conversation_message_ids_write` tool with the appropriate message IDs. If you have a specific task or question, please provide a summary or details so I can assist you more effectively.conversation tokens: 59500 (conversation round: 352)
 <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>function<｜tool▁sep｜>conversation_message_ids_write
```json
{"message_ids": ["fd2488a8", "6966ec28"]}
```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>{"status": "success", "message": "Message IDs have been saved for deletion."}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>
 I have saved the message IDs for deletion. Please let me know if you need further assistance.conversation tokens: 59695 (conversation round: 354)
 <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>function<｜tool▁sep｜>conversation_message_ids_write
```json
{"message_ids": ["8c12499c"]}
```<｜tool▁call▁end｜><｜tool▁calls▁end｜>
<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>{"status": "success", "message": "Message IDs have been saved for deletion."}<｜tool▁output▁end｜><｜tool▁outputs▁end｜>
 I have saved the message ID for deletion. Please let me know if you need further assistance.conversation tokens: 59885 (conversation round: 356)

大约两个都小时候，发现一个文件也没有创建，所以看来，这个模型还不够Auto-Coder的智力需求。

用transformers推理

因为刚开始vllm推理失败，所以尝试用transformers推理。

先安装modelscope

复制代码

pip install modelscope

推理

复制代码

from modelscope import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "/public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

总结

毕竟DeepSeek-Coder-V2-Lite-Instruct是上一代的模型，所以尽管它写代码的能力很强，但还是无法在Auto-Coder使用。也就是目前测试下来，Auto-Coder下Qwen3-30B-A3B可以使用，qwen 32b和DeepSeek-Coder-V2-Lite-Instruct都无法完成编程任务。

另外我修改了/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py 这个文件，加了K500SM_AI识别的这一句才能不报错：

复制代码

    elif "K500SM_AI" in device_name:
        # return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_default.json"
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"

不管用default.json还是K100AI.json，都是可以的，关键需要有K500SM_AI

调试

报错ValueError: Unsurpport device name: K500SM_AI

复制代码

INFO 12-13 23:14:52 [parallel_state.py:959] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 0
(VllmWorkerProcess pid=2716) INFO 12-13 23:14:52 [parallel_state.py:959] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, TP rank 1
INFO 12-13 23:14:52 [model_runner.py:1118] Starting to load model /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/...
(VllmWorkerProcess pid=2716) INFO 12-13 23:14:52 [model_runner.py:1118] Starting to load model /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/...
ERROR 12-13 23:14:52 [engine.py:448] Unsurpport device name: K500SM_AI
ERROR 12-13 23:14:52 [engine.py:448] Traceback (most recent call last):
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine
ERROR 12-13 23:14:52 [engine.py:448]     engine = MQLLMEngine.from_vllm_config(
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config
ERROR 12-13 23:14:52 [engine.py:448]     return cls(
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.engine = LLMEngine(*args, **kwargs)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 283, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 286, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     super().__init__(*args, **kwargs)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self._init_executor()
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor
ERROR 12-13 23:14:52 [engine.py:448]     self._run_workers("load_model",
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
ERROR 12-13 23:14:52 [engine.py:448]     driver_worker_output = run_method(self.driver_worker, sent_method,
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2506, in run_method
ERROR 12-13 23:14:52 [engine.py:448]     return func(*args, **kwargs)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 183, in load_model
ERROR 12-13 23:14:52 [engine.py:448]     self.model_runner.load_model()
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1121, in load_model
ERROR 12-13 23:14:52 [engine.py:448]     self.model = get_model(vllm_config=self.vllm_config)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
ERROR 12-13 23:14:52 [engine.py:448]     return loader.load_model(vllm_config=vllm_config)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 454, in load_model
ERROR 12-13 23:14:52 [engine.py:448]     model = _initialize_model(vllm_config=vllm_config)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
ERROR 12-13 23:14:52 [engine.py:448]     return model_class(vllm_config=vllm_config, prefix=prefix)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 706, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.model = DeepseekV2Model(vllm_config=vllm_config,
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/compilation/decorators.py", line 151, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 635, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.start_layer, self.end_layer, self.layers = make_layers(
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 609, in make_layers
ERROR 12-13 23:14:52 [engine.py:448]     [PPMissingLayer() for _ in range(start_layer)] + [
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 610, in <listcomp>
ERROR 12-13 23:14:52 [engine.py:448]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 637, in <lambda>
ERROR 12-13 23:14:52 [engine.py:448]     lambda prefix: DeepseekV2DecoderLayer(
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 525, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.self_attn = attn_cls(
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 457, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.mla_attn = Attention(
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/attention/layer.py", line 130, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_mla.py", line 70, in __init__
ERROR 12-13 23:14:52 [engine.py:448]     self.attn_configs = get_attention_mla_configs_json(self.num_heads, 1, self.kv_lora_rank + self.qk_rope_head_dim, self.kv_lora_rank, "fp16")
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 54, in get_attention_mla_configs_json
ERROR 12-13 23:14:52 [engine.py:448]     json_file_name = get_mla_config_file_name(QH, KVH, QKD, VD, cache_dtype)
ERROR 12-13 23:14:52 [engine.py:448]   File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 47, in get_mla_config_file_name
ERROR 12-13 23:14:52 [engine.py:448]     raise ValueError(f"Unsurpport device name: {device_name}")
ERROR 12-13 23:14:52 [engine.py:448] ValueError: Unsurpport device name: K500SM_AI
Process SpawnProcess-1:
ERROR 12-13 23:14:52 [multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 2716 died, exit code: -15
INFO 12-13 23:14:52 [multiproc_worker_utils.py:124] Killing local vLLM worker processes
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 450, in run_mp_engine
    raise e
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine
    engine = MQLLMEngine.from_vllm_config(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config
    return cls(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 82, in __init__
    self.engine = LLMEngine(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 283, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 286, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor
    self._run_workers("load_model",
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
    driver_worker_output = run_method(self.driver_worker, sent_method,
  File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2506, in run_method
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 183, in load_model
    self.model_runner.load_model()
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1121, in load_model
    self.model = get_model(vllm_config=self.vllm_config)
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
    return loader.load_model(vllm_config=vllm_config)
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 454, in load_model
    model = _initialize_model(vllm_config=vllm_config)
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
    return model_class(vllm_config=vllm_config, prefix=prefix)
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 706, in __init__
    self.model = DeepseekV2Model(vllm_config=vllm_config,
  File "/usr/local/lib/python3.10/dist-packages/vllm/compilation/decorators.py", line 151, in __init__
    old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 635, in __init__
    self.start_layer, self.end_layer, self.layers = make_layers(
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 609, in make_layers
    [PPMissingLayer() for _ in range(start_layer)] + [
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 610, in <listcomp>
    maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 637, in <lambda>
    lambda prefix: DeepseekV2DecoderLayer(
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 525, in __init__
    self.self_attn = attn_cls(
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 457, in __init__
    self.mla_attn = Attention(
  File "/usr/local/lib/python3.10/dist-packages/vllm/attention/layer.py", line 130, in __init__
    self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
  File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_mla.py", line 70, in __init__
    self.attn_configs = get_attention_mla_configs_json(self.num_heads, 1, self.kv_lora_rank + self.qk_rope_head_dim, self.kv_lora_rank, "fp16")
  File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 54, in get_attention_mla_configs_json
    json_file_name = get_mla_config_file_name(QH, KVH, QKD, VD, cache_dtype)
  File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 47, in get_mla_config_file_name
    raise ValueError(f"Unsurpport device name: {device_name}")
ValueError: Unsurpport device name: K500SM_AI
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/cli/main.py", line 51, in main
    args.dispatch_function(args)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/cli/serve.py", line 27, in cmd
    uvloop.run(run_server(args))
  File "/usr/local/lib/python3.10/dist-packages/uvloop/__init__.py", line 82, in run
    return loop.run_until_complete(wrapper())
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.10/dist-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 1069, in run_server
    async with build_async_engine_client(args) as engine_client:
  File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 269, in build_async_engine_client_from_engine_args
    raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.
root@notebook-1998971694380531714-ac7sc1ejvp-96619:~# /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

尝试

复制代码

vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/ --served-model-name  DeepSeek-Coder-V2-Lite-Instruct --trust-remote-code  --tensor-parallel-size 2  --device cuda

不行，加上这句试试

复制代码

export VLLM_DISABLE_TRITON=1

不行

加上

复制代码

--enable-reasoning --reasoning-parser deepseek_r1

不行

ai建议

复制代码

# 假设你已经装好 vllm、torch (CPU or ROCm) 等
export VLLM_DISABLE_TRITON=1   # 或者直接使用 --disable_mla
vllm serve \
    /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct \
    --host 0.0.0.0 \
    --port 8000 \
    --device auto \
    --tensor-parallel-size 1 \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.9 \
    --trust-remote-code

修改源代码

File "/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py", line 47, in get_mla_config_file_name

raise ValueError(f"Unsurpport device name: {device_name}")

ValueError: Unsurpport device name: K500SM_AI

修改这个文件/usr/local/lib/python3.10/dist-packages/vllm/attention/backends/triton_config.py，加入elif "K500SM_AI"这段：

复制代码

def get_mla_config_file_name(QH: int, KVH: int, QKD: int, VD: int, cache_dtype: Optional[str]) -> str:
    if cache_dtype == "default":
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_default.json"
    
    device_name = torch.cuda.get_device_name().replace(" ", "_")
    if "K100_AI" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"
    elif "BW" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_BW.json"
    elif "K500SM_AI" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"
    else:
        raise ValueError(f"Unsurpport device name: {device_name}")

问题解决。后来发现用default也是可以的：

复制代码

    elif "K500SM_AI" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_default.json"
        # return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"

现在出现新的报错：

ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks.

复制代码

(VllmWorkerProcess pid=2680) INFO 12-14 00:28:14 [loader.py:460] Loading weights took 94.00 seconds
INFO 12-14 00:28:14 [loader.py:460] Loading weights took 94.02 seconds
(VllmWorkerProcess pid=2680) INFO 12-14 00:28:14 [model_runner.py:1154] Model loading took 15.2695 GiB and 94.673958 seconds
INFO 12-14 00:28:14 [model_runner.py:1154] Model loading took 15.2695 GiB and 94.688603 seconds
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method determine_num_available_blocks.
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238] Traceback (most recent call last):
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238]   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238]     output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238]   File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2506, in run_method
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238]     return func(*args, **kwargs)
(VllmWorkerProcess pid=2680) ERROR 12-14 00:28:26 [multiproc_worker_utils.py:238]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

怀疑是机器的双卡坏了，明天换个机器看看。

继续测试，这回把那句话改成：

复制代码

    device_name = torch.cuda.get_device_name().replace(" ", "_")
    if "K100_AI" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_K100AI.json"
    elif "BW" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_BW.json"
    elif "K500SM_AI" in device_name:
        return f"QH={QH}_KVH={KVH}_QKD={QKD}_VD={VD}_{cache_dtype}_default.json"

执行

复制代码

vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/DeepSeek-Coder-V2-Lite-Instruct/main/DeepSeek-Coder-V2-Lite-Instruct/ --served-model-name  DeepSeek-Coder-V2-Lite-Instruct --trust-remote-code  --tensor-parallel-size 2

这样可以看到它的配置变成：

(VllmWorkerProcess pid=2833) WARNING 12-14 19:47:22 [fused_moe.py:959] Using default MoE config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=64,N=704,device_name=K500SM_AI.json

好了，启动了

端口映射出来：

复制代码

https://c-1998971694380531714.ksai.scnet.cn:58043