文章目录
本文档主要用于学习使用 vLLM部署Qwen3-1.7B过程的调参, 1.7B能干啥?
一. PC配置情况
硬件信息:
- 内存: 40.0 GiB
- 处理器: Intel® Core™ i7-9750H × 12
- 显卡: Intel® UHD Graphics 630 (CFL GT2)
- 显卡 1: Intel® UHD Graphics 630 (CFL GT2)
- 磁盘容量: 512.1 GB
软件信息:
- 固件版本: E16U5IMS.101
- 操作系统名称: Ubuntu 24.04.3 LTS
- 操作系统类型: 64 位
- GNOME 版本: 46
- 窗口系统: X11
- 内核版本: Linux 6.17.0-23-generic
二. 模型文件下载
shell
(.venv) liu@shun:~/work $ hf download Qwen/Qwen3-1.7B --local-dir /data/huggingface/Qwen3-1.7B
...
Download complete: : 4.08GB [02:42, 25.1MB/s]
三. 检测环境:
shell
liu@shun:/data/huggingface$ nvidia-smi # 用于验证PC本地的nvidia驱动是否已经安装完成。
Sat May 9 10:44:22 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09 Driver Version: 580.126.09 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1660 Ti Off | 00000000:01:00.0 On | N/A |
| N/A 48C P8 4W / 80W | 4202MiB / 6144MiB | 23% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3561 G /usr/lib/xorg/Xorg 81MiB |
| 0 N/A N/A 138918 C VLLM::EngineCore 4100MiB |
+-----------------------------------------------------------------------------------------+
liu@shun:/data/huggingface$ docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi # 用于验证在docker内部能否调用GPU。
Sat May 9 02:53:16 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09 Driver Version: 580.126.09 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1660 Ti Off | 00000000:01:00.0 On | N/A |
| N/A 47C P8 4W / 80W | 4202MiB / 6144MiB | 19% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
liu@shun:/data/huggingface$
- nvidia-smi 用于验证PC本地的nvidia驱动是否已经安装完成。
- docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi # 用于验证在docker内部能否调用GPU
- sudo apt install -y nvidia-container-toolkit 若不能docker内部调用GPU失败,则需要安装此包。
四. 调试
yml
version: "3.9"
services:
qwen_llm:
image: vllm/vllm-openai:v0.12.0
container_name: qwen_llm
runtime: nvidia
environment:
PYTORCH_CUDA_ALLOC_CONF: "expandable_segments:True"
ipc: host
ports:
- "8001:8000"
volumes:
- /data/huggingface/Qwen3-1.7B:/model:ro
command: >
/model
--served-model-name Qwen3-1.7B # 在这里配置模型的名字, 为了避免歧义,直接用了模型原生的名字
--gpu-memory-utilization 0.7 # GPU的占用率,如果不够,则会报错,调试过程由0.5 -> 0.6 -> 0.7 才跑起来了
--max-model-len 800 # 上下文长度。 与gpu-memory-utilization相关联,调试过程中会有提示的, 从1024, 改成了800.
--max-num-seqs 1 # 并发数, 在这么LOW的配置上1个并发数不错了, 还要啥自行车
restart: unless-stopped
shell
liu@shun:/data/huggingface$ docker compose up -d
liu@shun:/data/huggingface$ curl http://127.0.0.1:8001/v1/models # 用于检验模型是不是跑起来了。
{"object":"list","data":[{"id":"Qwen3-1.7B","object":"model","created":1778295880,"owned_by":"vllm","root":"/model","parent":null,"max_model_len":800,"permission":[{"id":"modelperm-a07683e5e299e197","object":"model_permission","created":1778295880,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}
liu@shun:/data/huggingface$