如何在kubernetes环境中共享GPU

随着人工智能和大模型的快速发展,云上GPU资源共享变得必要,因为它可以降低硬件成本,提升资源利用效率,并满足模型训练和推理对大规模并行计算的需求。

在kubernetes内置的资源调度功能中,GPU调度只能根据"核数"进行调度,但是深度学习等算法程序执行过程中,资源占用比较高的是显存,这样就形成了很多的资源浪费。

目前的GPU资源共享方案有两种。一种是将一个真正的GPU分解为多个虚拟GPU,即vGPU,这样就可以基于vGPU的数量进行调度;另一种是根据GPU的显存进行调度。

本文将讲述如何安装kubernetes组件实现根据GPU显存调度资源。

系统信息

  • 系统:centos stream8

  • 内核:4.18.0-490.el8.x86_64

  • 驱动:NVIDIA-Linux-x86_64-470.182.03

  • docker:20.10.24

  • kubernetes版本:1.24.0

1. 驱动安装

请登录nvida官网自行安装:https://www.nvidia.com/Download/index.aspx?lang=en-us

2. docker安装

请自行安装docker或其他容器运行时,如果使用其他容器运行时,第三步配置请参考NVIDA官网 https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installation-guide

注意:官方支持docker、containerd、podman,但本文档只验证过docker的使用,如果使用其他容器运行时,请注意差异性。

3. NVIDIA Container Toolkit 安装

  1. 设置仓库与GPG Key

    distribution=(. /etc/os-release;echo IDVERSION_ID) \ && curl -s -L https://nvidia.github.io/libnvidia-container/distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

  2. 开始安装

    sudo dnf clean expire-cache --refresh
    sudo dnf install -y nvidia-container-toolkit

  3. 修改docker配置文件添加容器运行时实现

    sudo nvidia-ctk runtime configure --runtime=docker

  4. 修改/etc/docker/daemon.json,设置nvidia为默认容器运行时(必需)

    {
    "default-runtime": "nvidia",
    "runtimes": {
    "nvidia": {
    "path": "/usr/bin/nvidia-container-runtime",
    "runtimeArgs": []
    }
    }
    }

  5. 重启docker并开始验证是否生效

    sudo systemctl restart docker
    sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

如果返回如下数据,说明配置成功

复制代码
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
​
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

4. 安装K8S GPU调度器

  1. 首先执行以下yaml,部署调度器

    rbac.yaml


    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
    name: gpushare-schd-extender
    rules:

    • apiGroups:
      • ""
        resources:
      • nodes
        verbs:
      • get
      • list
      • watch
    • apiGroups:
      • ""
        resources:
      • events
        verbs:
      • create
      • patch
    • apiGroups:
      • ""
        resources:
      • pods
        verbs:
      • update
      • patch
      • get
      • list
      • watch
    • apiGroups:
      • ""
        resources:
      • bindings
      • pods/binding
        verbs:
      • create
    • apiGroups:
      • ""
        resources:
      • configmaps
        verbs:
      • get
      • list
      • watch

    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: gpushare-schd-extender
    namespace: kube-system

    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
    name: gpushare-schd-extender
    namespace: kube-system
    roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: gpushare-schd-extender
    subjects:

    • kind: ServiceAccount
      name: gpushare-schd-extender
      namespace: kube-system

    deployment yaml


    kind: Deployment
    apiVersion: apps/v1
    metadata:
    name: gpushare-schd-extender
    namespace: kube-system
    spec:
    replicas: 1
    strategy:
    type: Recreate
    selector:
    matchLabels:
    app: gpushare
    component: gpushare-schd-extender
    template:
    metadata:
    labels:
    app: gpushare
    component: gpushare-schd-extender
    annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
    hostNetwork: true
    tolerations:
    - effect: NoSchedule
    operator: Exists
    key: node-role.kubernetes.io/master
    - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
    - effect: NoSchedule
    operator: Exists
    key: node.cloudprovider.kubernetes.io/uninitialized
    nodeSelector:
    node-role.kubernetes.io/control-plane: ""
    serviceAccount: gpushare-schd-extender
    containers:
    - name: gpushare-schd-extender
    image: registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-schd-extender:1.11-d170d8a
    env:
    - name: LOG_LEVEL
    value: debug
    - name: PORT
    value: "12345"

    service.yaml


    apiVersion: v1
    kind: Service
    metadata:
    name: gpushare-schd-extender
    namespace: kube-system
    labels:
    app: gpushare
    component: gpushare-schd-extender
    spec:
    type: NodePort
    ports:
    - port: 12345
    name: http
    targetPort: 12345
    nodePort: 32766
    selector:
    # select app=ingress-nginx pods
    app: gpushare
    component: gpushare-schd-extender

  2. 在/etc/kubernetes目录下添加调度策略配置文件

    #scheduler-policy-config.yaml

    apiVersion: kubescheduler.config.k8s.io/v1beta2
    kind: KubeSchedulerConfiguration
    clientConnection:
    kubeconfig: /etc/kubernetes/scheduler.conf
    extenders:
    # 不知道为什么不支持svc的方式调用,必须用nodeport

上面的 http://gpushare-schd-extender.kube-system:12345 注意要替换为你本地部署的{nodeIP}:{gpushare-schd-extender的nodeport端口},否则会访问不到

查询命令如下:

复制代码
kubectl get service gpushare-schd-extender -n kube-system -o jsonpath='{.spec.ports[?(@.name=="http")].nodePort}'
  1. 修改kubernetes调度配置 /etc/kubernetes/manifests/kube-scheduler.yaml

    1. 在commond中添加
    • --config=/etc/kubernetes/scheduler-policy-config.yaml
    1. 添加pod挂载目录
      在volumeMounts:中添加
    • mountPath: /etc/kubernetes/scheduler-policy-config.yaml
      name: scheduler-policy-config
      readOnly: true
      在volumes:中添加
    • hostPath:
      path: /etc/kubernetes/scheduler-policy-config.yaml
      type: FileOrCreate
      name: scheduler-policy-config

注意:这里千万不要改错,否则可能会出现莫名其妙的错误

示例如下:

  1. 配置rbac及安装device插件

    kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml
    kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml

5. 在GPU节点上添加标签

复制代码
kubectl label node <target_node> gpushare=true

6. 安装kubectl Gpu 插件

复制代码
cd /usr/bin/
wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare
chmod u+x /usr/bin/kubectl-inspect-gpushare

7. 验证

  1. 使用kubectl查询GPU资源使用情况

    kubectl inspect gpushare

    NAME IPADDRESS GPU0(Allocated/Total) GPU Memory(GiB)
    cn-shanghai.i-uf61h64dz1tmlob9hmtb 192.168.0.71 6/15 6/15
    cn-shanghai.i-uf61h64dz1tmlob9hmtc 192.168.0.70 3/15 3/15

    Allocated/Total GPU Memory In Cluster:
    9/30 (30%)

  2. 创建一个有GPU需求的资源,查看其资源调度情况

    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: binpack-1
    labels:
    app: binpack-1
    spec:
    replicas: 1
    selector: # define how the deployment finds the pods it manages
    matchLabels:
    app: binpack-1
    template: # define the pods specifications
    metadata:
    labels:
    app: binpack-1
    spec:
    tolerations:
    - effect: NoSchedule
    key: cloudClusterNo
    operator: Exists
    containers:
    - name: binpack-1
    image: cheyang/gpu-player:v2
    resources:
    limits:
    # 单位GiB
    aliyun.com/gpu-mem: 3

8. 问题排查

如果在安装过程中发现资源未安装成功,可以通过pod查看日志

复制代码
kubectl get po -n kube-system -o=wide | grep gpushare-device 
kubecl logs -n kube-system <pod_name>

参考地址:

NVIDA官网container-toolkit安装文档: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

阿里云GPU插件安装:https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/docs/install.md

相关推荐
Seal软件2 天前
GPUStack v2:推理加速释放算力潜能,开源重塑大模型推理下半场
llm·gpu
Eloudy3 天前
learning_gem5 part1_05 gem5 v24.1:使用 gem5 标准库配置脚本
gpu·arch·gem5
Eloudy5 天前
learning_gem5 part1_04 理解gem5统计信息与输出文件
gpu·arch·gem5
iru5 天前
kubectl cp详解,k8s集群与本地环境文件拷贝
运维·容器·k8s
Eloudy6 天前
全文 -- GPU-Initiated Networking for NCCL
gpu·arch
HyperAI超神经6 天前
【TVM 教程】优化大语言模型
人工智能·语言模型·自然语言处理·cpu·gpu·编程语言·tvm
竹君子7 天前
研发管理知识库(15)K8s和Istio关系
k8s
Felven7 天前
天数智芯MR50推理卡测试
gpu·推理·mr50·天数
杰克逊的日记7 天前
slurm部署
cpu·gpu·作业管理
Eloudy7 天前
AMD Instinct MI300 系列 GPU 技术规格说明
gpu