docker pull nvcr.io/nvidia/vllm:26.01-py3
docker run -it --gpus all -p 8000:8000 \ nvcr.io/nvidia/vllm:${LATEST_VLLM_VERSION} \ vllm serve "Qwen/Qwen2.5-Math-1.5B-Instruct"
docker run -it --rm \ --gpus all \ -p 8000:8000 \ -e HF_ENDPOINT=https://hf-mirror.com \ nvcr.io/nvidia/vllm:26.01-py3 \ vllm serve Qwen/Qwen3.5-9B-Instruct \ --trust-remote-code \ --host 0.0.0.0
docker run -it --rm \ --gpus all \ -p 8000:8000 \ --ipc=host \ -e HF_ENDPOINT=https://hf-mirror.com \ nvcr.io/nvidia/vllm:26.01-py3 \ vllm serve Qwen/Qwen2-7B-Instruct \ --host 0.0.0.0
http://服务器IP:8000/v1/chat/completions
curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen2-7B-Instruct", "messages": [ {"role": "user", "content": "你好!你是谁"} ] }'
其他电脑访问 vLLM 模型的地址
1. 基础访问地址(前端 / 面板用)
plaintext
http://192.168.1.77:8000
2. OpenAI 兼容 API 地址(所有软件通用)
plaintext
http://192.168.1.77:8000/v1
3. 对话接口(可直接测试)
plaintext
http://192.168.1.77:8000/v1/chat/completions
其他电脑测试命令(直接复制可用)
bash
运行
curl http://192.168.1.77:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2-7B-Instruct",
"messages": [
{"role": "user", "content": "你好"}
]
}'
Ubuntu 开放端口(必须执行一次)
bash
运行
sudo ufw allow 8000/tcp
sudo ufw reload