大模型部署

taihexuelang2026-01-12 12:24

大模型：

docker run -d --gpus all -v D:\ai\DeepSeek-R1-Distill-Qwen-1.5B:/models -p 8000:8000 --ipc=host docker.1panel.live/vllm/vllm-openai:latest /models --trust-remote-code --max-model-len 4096 --served-model-name qwen-1.5b --gpu-memory-utilization 0.7 --disable-log-requests

embedding模型

docker run -d --gpus all -v D:\ai\Qwen3-VL-Embedding-2B:/models -p 8001:8001 --ipc=host docker.1panel.live/vllm/vllm-openai:latest /models --trust-remote-code --max-model-len 4096 --served-model-name Embedding-2B --gpu-memory-utilization 0.5 --disable-log-requests

curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d "{\"model\":\"qwen-1.5b\",\"prompt\":\"你好，你是谁？简单介绍一下自己\",\"max_tokens\":200,\"temperature\":0.7}"

langchain必须结合langchain_openai进行远程调用

上一篇：统计学的"测谎仪"：一文搞懂方差、标准差与“N-1”的秘密

下一篇：YOLOv8轻量级改进：slimneck-prune技术实现番茄大小分选与成熟度识别