8卡服务器(4服务x 2卡)Nginx 负载均衡配置,与百分位延迟说明upstream vllm_backend { least_conn; server 10.255.254.44:8000; server 10.255.254.44:8001; server 10.255.254.44:8002; server 10.255.254.44:8003; } server { listen 0.0.0.0:2196; server_name _; location /v1/ { proxy_pass http://vllm_backend; proxy_set_header