作者:吴业亮
博客:wuyeliang.blog.csdn.net
一、准备环境
1、删除可能存在的Nvidia驱动
apt-get remove --purge nvidia*
2、禁用原有开源显卡驱动
vi /etc/modprobe.d/blacklist-nouveau.conf
然后将下面内容添加进去[存在则追加],即将nouveau驱动拉黑
blacklist nouveau
options nouveau modeset=0
保存后,执行如下命令更新改动
update-initramfs -u
执行sudo reboot重启电脑,nouveau驱动应该已被禁用,执行下面命令没有输出才对
lsmod | grep nouveau
3、安装gcc
apt install gcc g++ make -y
二、安装驱动
1、确认显卡型号
查明你的NVIDIA显卡型号,以确保下载驱动程序的版本:
lspci | grep -i vga
root@ubuntu:~# lspci | grep -i vga
03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller (rev 04)
af:00.0 VGA compatible controller: NVIDIA Corporation TU104GL [Quadro RTX 4000] (rev a1)
NVIDIA官网https://www.nvidia.cn/Download/index.aspx?lang=cn#
例如:
# wget https://us.download.nvidia.com/XFree86/Linux-x86_64/550.120/NVIDIA-Linux-x86_64-550.120.run
进入之前下载驱动文件的目录,执行如下命令
换成自己下载的文件名
sudo ./NVIDIA-Linux-x86_64-450.80.02.run
验证安装
root@ubuntu:~# nvidia-smi
Mon Oct 14 08:55:46 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Quadro RTX 4000 Off | 00000000:AF:00.0 Off | N/A |
| 30% 27C P8 9W / 125W | 97MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 56201 C python3 94MiB |
+-----------------------------------------------------------------------------------------+
卸载驱动
# sh ./NVIDIA-Linux-x86_64-550.120.run --uninstall
三、安装kubersphere和k8s
sudo apt-get update
sudo apt install socat conntrack ebtables ipset jq -y
export KKZONE=cn
最新版本
curl -sfL https://get-kk.kubesphere.io | sh -
指定版本
curl -sfL https://get-kk.kubesphere.io | VERSION=v3.1.10 sh -
注意:
通过这里可以看到kubekey版本
https://github.com/kubesphere/kubekey/releases/
开始安装kubersphere和k8s
./kk create cluster --with-kubernetes v1.28.12 --with-kubesphere v3.4.1
./kk create cluster --with-kubernetes v1.30.10 --with-kubesphere v4.1.3
或者:
./kk create cluster --with-kubernetes v1.30.10
......
Installation is complete.
Please check the result using the command:
kubectl get pod -A
./kk create cluster v4.1.3
KubeSphere v4.1.3 经测试支持的 Kubernetes 版本为 v1.21~v1.30,未经测试的版本不做兼容保证
查询k8s版本
https://github.com/kubernetes/kubernetes/releases
查询kubesphere版本
https://github.com/kubesphere/kubesphere/releases
四、安装 gpu-operator
1、前提条件
Node Feature Discovery (NFD) 检测功能检查。
# kubectl get nodes -o json | jq '.items[].metadata.labels | keys | any(startswith("feature.node.kubernetes.io"))'
上面的命令执行结果为 true, 说明 NFD 已经在集群中运行。如果NFD已经在集群中运行,那么在安装 Operator 时必须禁用部署 NFD。
说明: 使用 KubeSphere 部署的 K8s 集群默认不会安装配置 NFD。
2、设置代理
设置代理
对所有协议(包括 HTTP、HTTPS、FTP 等)的请求都生效
export all_proxy=http://192.168.1.119:7890
export NO_PROXY=localhost,127.0.0.1,.svc,.cluster.local,192.168.1.247,lb.kubesphere.local
测试
root@ubuntu:~# curl google.com
root@ubuntu:~# curl http://google.com
root@ubuntu:~# echo $?
0
设置docker代理
root@ubuntu:~# systemctl status docker | grep system | grep enabled
Loaded: loaded (/etc/systemd/system/docker.service; enabled; vendor preset: enabled)
mkdir /etc/systemd/system/docker.service.d
rm -rf /etc/systemd/system/docker.service.d/https-proxy.conf
cat <<EOF>> /etc/systemd/system/docker.service.d/https-proxy.conf
[Service]
Environment="HTTPS_PROXY=http://192.168.1.119:7890/"
Environment="NO_PROXY=localhost,127.0.0.0/8,docker-registry.somecorporation.com,172.16.0.0/16,192.168.1.0/24"
EOF
systemctl daemon-reload
service docker restart
测试
docker search mysql
3、安装 NVIDIA GPU Operator
添加 NVIDIA Helm repository
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update
5、安装 GPU Operator
离线下载gpu-operator
helm fetch nvidia/gpu-operator --version v25.3.1
root@ubuntu:~# kubectl create namespace gpu-operator
namespace/gpu-operator created
使用默认配置文件,禁用自动安装显卡驱动功能,安装 GPU Operator。
root@ubuntu:~# helm install --wait --generate-name -n gpu-operator ./gpu-operator-v25.3.1.tgz --set driver.enabled=false
NAME: gpu-operator-v25-1750221680
LAST DEPLOYED: Wed Jun 18 04:41:20 2025
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
验证
# kubectl get pods -n gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-czdf5 1/1 Running 0 15m
gpu-feature-discovery-q9qlm 1/1 Running 0 15m
gpu-operator-67c68ddccf-x29pm 1/1 Running 0 15m
gpu-operator-node-feature-discovery-gc-57457b6d8f-zjqhr 1/1 Running 0 15m
gpu-operator-node-feature-discovery-master-5fb74ff754-fzbzm 1/1 Running 0 15m
gpu-operator-node-feature-discovery-worker-68459 1/1 Running 0 15m
gpu-operator-node-feature-discovery-worker-74ps5 1/1 Running 0 15m
gpu-operator-node-feature-discovery-worker-dpmg9 1/1 Running 0 15m
gpu-operator-node-feature-discovery-worker-jvk4t 1/1 Running 0 15m
gpu-operator-node-feature-discovery-worker-k5kwq 1/1 Running 0 15m
gpu-operator-node-feature-discovery-worker-ll4bk 1/1 Running 0 15m
gpu-operator-node-feature-discovery-worker-p4q5q 1/1 Running 0 15m
gpu-operator-node-feature-discovery-worker-rmk99 1/1 Running 0 15m
nvidia-container-toolkit-daemonset-9zcnj 1/1 Running 0 15m
nvidia-container-toolkit-daemonset-kcz9g 1/1 Running 0 15m
nvidia-cuda-validator-l8vjb 0/1 Completed 0 14m
nvidia-cuda-validator-svn2p 0/1 Completed 0 13m
nvidia-dcgm-exporter-9lq4c 1/1 Running 0 15m
nvidia-dcgm-exporter-qhmkg 1/1 Running 0 15m
nvidia-device-plugin-daemonset-7rvfm 1/1 Running 0 15m
nvidia-device-plugin-daemonset-86gx2 1/1 Running 0 15m
nvidia-operator-validator-csr2z 1/1 Running 0 15m
nvidia-operator-validator-svlc4 1/1 Running 0 15m
五、GPU 功能验证测试
1、 测试示例1-验证测试 CUDA
GPU Operator 正确安装完成后,使用 CUDA 基础镜像,测试 K8s 是否能正确创建使用 GPU 资源的 Pod。
创建资源清单文件cuda-ubuntu.yaml
apiVersion: v1
kind: Pod
metadata:
name: cuda-ubuntu2204
spec:
restartPolicy: OnFailure
containers:
- name: cuda-ubuntu2204
image: "nvcr.io/nvidia/cuda:12.4.0-base-ubuntu22.04"
resources:
limits:
nvidia.com/gpu: 1
command: ["nvidia-smi"]
创建资源
kubectl apply -f cuda-ubuntu.yaml
查看创建的资源
从结果中可以看到 pod 创建在了 ksp-gpu-worker-2 节点(该节点显卡型号 Tesla P100-PCIE-16GB)。
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cuda-ubuntu2204 0/1 Completed 0 73s 10.233.99.15 ksp-gpu-worker-2 <none> <none>
ollama-79688d46b8-vxmhg 1/1 Running 0 47m 10.233.72.17 ksp-gpu-worker-1 <none> <none>
查看 Pod 日志
kubectl logs pod/cuda-ubuntu2204
正确执行输出结果如下:
# kubectl logs pod/cuda-ubuntu2204
Mon Jul 8 11:10:59 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P100-PCIE-16GB Off | 00000000:00:10.0 Off | 0 |
| N/A 40C P0 26W / 250W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
清理测试资源
kubectl delete -f cuda-ubuntu.yaml
2、测试示例 2-官方 GPU Applications 示例
执行一个简单的 CUDA 示例,用于将两个向量(vectors)相加。
创建资源清单文件cuda-vectoradd.yaml
apiVersion: v1
kind: Pod
metadata:
name: cuda-vectoradd
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vectoradd
image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
resources:
limits:
nvidia.com/gpu: 1
执行命令创建 Pod
# kubectl apply -f cuda-vectoradd.yaml
查看 Pod 执行结果
Pod 创建成功,启动后会运行 vectorAdd 命令并退出。
# kubectl logs pod/cuda-vectoradd
正确执行输出结果如下:
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
清理测试资源
kubectl delete -f cuda-vectoradd.yaml
六、 KubeSphere 部署 Ollama
通过上面的验证测试,证明可以在 K8s 集群上创建使用 GPU 的 Pod 资源,接下来我们结合实际使用需求,利用 KubeSphere 在 K8s 集群创建一套大模型管理工具 Ollama。
1、创建部署资源清单
本示例属于简单测试,存储选择了 hostPath 模式,实际使用中请替换为存储类或是其他类型的持久化存储。
创建资源清单deploy-ollama.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
name: ollama
namespace: default
labels:
app: ollama
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
volumes:
- name: ollama-models
hostPath:
path: /data/openebs/local/ollama
type: ''
- name: host-time
hostPath:
path: /etc/localtime
type: ''
containers:
- name: ollama
image: 'ollama/ollama:latest'
ports:
- name: http-11434
containerPort: 11434
protocol: TCP
resources:
limits:
nvidia.com/gpu: '1'
requests:
nvidia.com/gpu: '1'
volumeMounts:
- name: ollama-models
mountPath: /root/.ollama
- name: host-time
readOnly: true
mountPath: /etc/localtime
imagePullPolicy: IfNotPresent
restartPolicy: Always
---
kind: Service
apiVersion: v1
metadata:
name: ollama
namespace: default
labels:
app: ollama
spec:
ports:
- name: http-11434
protocol: TCP
port: 11434
targetPort: 11434
nodePort: 31434
selector:
app: ollama
type: NodePort
特殊说明: KubeSphere 的管理控制台支持图形化配置 Deployment 等资源使用 GPU 资源,配置示例如下,感兴趣的朋友可以自行研究。
2、部署 Ollama 服务
创建 Ollama
kubectl apply -f deploy-ollama.yaml
查看 Pod 创建结果
从结果中可以看到 pod 创建在了 ksp-gpu-worker-1 节点(该节点显卡型号 Tesla M40 24GB)。
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
k 1/1 Running 0 12s 10.233.72.17 ksp-gpu-worker-1 <none> <none>
查看容器 log
kubectl logs ollama-79688d46b8-vxmhg
2024/07/08 18:24:27 routes.go:1064: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-07-08T18:24:27.829+08:00 level=INFO source=images.go:730 msg="total blobs: 5"
time=2024-07-08T18:24:27.829+08:00 level=INFO source=images.go:737 msg="total unused blobs removed: 0"
time=2024-07-08T18:24:27.830+08:00 level=INFO source=routes.go:1111 msg="Listening on [::]:11434 (version 0.1.48)"
time=2024-07-08T18:24:27.830+08:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2414166698/runners
time=2024-07-08T18:24:32.454+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60101]"
time=2024-07-08T18:24:32.567+08:00 level=INFO source=types.go:98 msg="inference compute" id=GPU-9e48dc13-f8f1-c6bb-860f-c82c96df22a4 library=cuda compute=5.2 driver=12.4 name="Tesla M40 24GB" total="22.4 GiB" available="22.3 GiB"
4、拉取 Ollama 使用的大模型
Ollama 拉取模型
本示例为了节省时间,采用阿里开源的 qwen2 1.5b 小尺寸模型作为测试模型。
kubectl exec -it ollama-79688d46b8-vxmhg -- ollama pull qwen2:1.5b
正确执行输出结果如下:
# kubectl exec -it ollama-79688d46b8-vxmhg -- ollama pull qwen2:1.5b
pulling manifest
pulling 405b56374e02... 100% ▕█████████████████████████████████████████████████████▏ 934 MB
pulling 62fbfd9ed093... 100% ▕█████████████████████████████████████████████████████▏ 182 B
pulling c156170b718e... 100% ▕█████████████████████████████████████████████████████▏ 11 KB
pulling f02dd72bb242... 100% ▕█████████████████████████████████████████████████████▏ 59 B
pulling c9f5e9ffbc5f... 100% ▕█████████████████████████████████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
removing any unused layers
success
查看模型文件的内容
在 ksp-gpu-worker-1 节点执行下面的查看命令
$ ls -R /data/openebs/local/ollama/
/data/openebs/local/ollama/:
id_ed25519 id_ed25519.pub models
/data/openebs/local/ollama/models:
blobs manifests
/data/openebs/local/ollama/models/blobs:
sha256-405b56374e02b21122ae1469db646be0617c02928fd78e246723ebbb98dbca3e
sha256-62fbfd9ed093d6e5ac83190c86eec5369317919f4b149598d2dbb38900e9faef
sha256-c156170b718ec29139d3653d40ed1986fd92fb7e0959b5c71f3c48f62e6636f4
sha256-c9f5e9ffbc5f14febb85d242942bd3d674a8e4c762aaab034ec88d6ba839b596
sha256-f02dd72bb2423204352eabc5637b44d79d17f109fdb510a7c51455892aa2d216
/data/openebs/local/ollama/models/manifests:
registry.ollama.ai
/data/openebs/local/ollama/models/manifests/registry.ollama.ai:
library
/data/openebs/local/ollama/models/manifests/registry.ollama.ai/library:
qwen2
/data/openebs/local/ollama/models/manifests/registry.ollama.ai/library/qwen2:
1.5b
4、模型能力测试
调用接口测试
curl http://192.168.9.91:31434/api/chat -d '{
"model": "qwen2:1.5b",
"messages": [
{ "role": "user", "content": "用20个字,介绍你自己" }
]
}'
测试结果
$ curl http://192.168.9.91:31434/api/chat -d '{
"model": "qwen2:1.5b",
"messages": [
{ "role": "user", "content": "用20个字,介绍你自己" }
]
}'
{"model":"qwen2:1.5b","created_at":"2024-07-08T09:54:48.011798927Z","message":{"role":"assistant","content":"我"},"done":false}
{"model":"qwen2:1.5b","created_at":"2024-07-08T09:54:48.035291669Z","message":{"role":"assistant","content":"是一个"},"done":false}
{"model":"qwen2:1.5b","created_at":"2024-07-08T09:54:48.06360233Z","message":{"role":"assistant","content":"人工智能"},"done":false}
{"model":"qwen2:1.5b","created_at":"2024-07-08T09:54:48.092411266Z","message":{"role":"assistant","content":"助手"},"done":false}
{"model":"qwen2:1.5b","created_at":"2024-07-08T09:54:48.12016935Z","message":{"role":"assistant","content":","},"done":false}
{"model":"qwen2:1.5b","created_at":"2024-07-08T09:54:48.144921623Z","message":{"role":"assistant","content":"专注于"},"done":false}
{"model":"qwen2:1.5b","created_at":"2024-07-08T09:54:48.169803961Z","message":{"role":"assistant","content":"提供"},"done":false}
{"model":"qwen2:1.5b","created_at":"2024-07-08T09:54:48.194796364Z","message":{"role":"assistant","content":"信息"},"done":false}
{"model":"qwen2:1.5b","created_at":"2024-07-08T09:54:48.21978104Z","message":{"role":"assistant","content":"和"},"done":false}
{"model":"qwen2:1.5b","created_at":"2024-07-08T09:54:48.244976103Z","message":{"role":"assistant","content":"帮助"},"done":false}
{"model":"qwen2:1.5b","created_at":"2024-07-08T09:54:48.270233992Z","message":{"role":"assistant","content":"。"},"done":false}
{"model":"qwen2:1.5b","created_at":"2024-07-08T09:54:48.29548561Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":454377627,"load_duration":1535754,"prompt_eval_duration":36172000,"eval_count":12,"eval_duration":287565000}
5、查看 GPU 分配信息
查看 Worker 节点已分配的 GPU 资源
$ kubectl describe node ksp-gpu-worker-1 | grep "Allocated resources" -A 9
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 487m (13%) 2 (55%)
memory 315115520 (2%) 800Mi (5%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
nvidia.com/gpu 1 1
Ollama 运行时物理 GPU 使用情况