环境
系统:CentOS-7
CPU : E5-2680V4 14核28线程
内存:DDR4 2133 32G * 2
显卡:Tesla V100-32G【PG503】 (水冷)
驱动: 535
CUDA: 12.2
启动测速
ollama run qwen3:14b --verbose
你好,介绍下内存,要2000字
速度
total duration: 52.754181331s
load duration: 47.845498ms
prompt eval count: 2291 token(s)
prompt eval duration: 1.737233872s
prompt eval rate: 1318.76 tokens/s
eval count: 2556 token(s)
eval duration: 50.871698922s
eval rate: 50.24 tokens/s
GPU
Thu Dec 4 23:21:30 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla PG503-216 Off | 00000000:04:00.0 Off | 0 |
| N/A 36C P0 246W / 300W | 15198MiB / 32768MiB | 92% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+