无头服务器 + Vulkan + Docker 问题

为了解决docker内 Vulkan 无法识别 GPU

vulkaninfo 显示 llvmpipe ,只支持cpu渲染的问题

参考:

一、第一篇issues

https://github.com/NVIDIA/nvidia-container-toolkit/issues/1472

复制代码
nvidia-ctk cdi generate 2>nvidia-ctk_cdi_generate.log > nvidia.yaml

二、第二篇issues

https://github.com/NVIDIA/nvidia-container-toolkit/issues/191#issuecomment-2022154630

I recently was encountering similar problems (see my reply to issue 16 where I list details of my environment) for a headless (i.e. no wayland, no x11) vulkan application running in my organization's internal (openstack powered) cloud. Based on the snippets above, I assume some of you, like me, are not using a GUI either. Here is what I learned:

  • --runtime=nvidia should be combined with --gpus all in order for the vulkan ICDs to be mounted from the host with recent versions of the nvidia-container-toolkit. Without it, you can see from docker inspect -f '{``{ .HostConfig.Runtime }}' that runc is being used instead of the nvidia runtime. For some reason, this isn't needed for nvidia-smi to work, possibly due to hooks(?)
  • For headless, we should use EGL instead of GLX implementation since there is no X11
  • The stack deployed by the vulkan-tools package in Ubuntu 22.04 only recognizes the deprecated VK_ICD_FILENAMES environment variable when setting paths to an ICD
  • The path to the needed ICD (/usr/share/glvnd/egl_vendor.d/10_nvidia.json) isn't automatically found by this version's vulkan loader
  • XDG_RUNTIME_DIR and DISPLAY errors can be ignored because we are not using X11
  • The non-default graphics capability is needed
  • You do not need the cuda or the (abandoned?) vulkan nvidia container images. You can get this functionality out of the base Ubuntu 22.04 docker image (and likely others) by installing the (equivalent) vulkan-tools package and setting environment variables at container launch needed by nvidia-container-toolkit (NVIDIA_DRIVER_CAPABILITIES) and the vulkan loader (VK_ICD_FILENAMES).

Example Dockerfile using a headless, third-party sample program that will render a ppm file on the gpu using vulkan as a non-root, limited user (luser). The use of a non-root user is a preference; not required. If you are quick, you can see /home/luser/bin/renderheadless running on the gpu via nvidia-smi on the host:

复制代码
$ cat Dockerfile
FROM ubuntu:22.04 AS vulkan-sample-dev

ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y gcc g++ make cmake libvulkan-dev libglm-dev curl unzip && apt-get clean

RUN useradd luser
USER luser
WORKDIR /home/luser
RUN curl -L -o master.zip https://github.com/SaschaWillems/Vulkan/archive/refs/heads/master.zip && unzip master.zip && rm master.zip
RUN cmake -DUSE_HEADLESS=ON Vulkan-master && \
    make renderheadless

FROM ubuntu:22.04 AS vulkan-sample-run

ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y vulkan-tools && apt-get clean

ENV VK_ICD_FILENAMES=/usr/share/glvnd/egl_vendor.d/10_nvidia.json 
ENV NVIDIA_DRIVER_CAPABILITIES=graphics

RUN useradd luser
COPY --chown=luser:luser --from=vulkan-sample-dev /home/luser/bin/renderheadless /home/luser/bin/renderheadless
COPY --chown=luser:luser --from=vulkan-sample-dev /home/luser/Vulkan-master/shaders/glsl/renderheadless/ /home/luser/Vulkan-master/shaders/glsl/renderheadless/
USER luser
WORKDIR /home/luser
CMD [ "/home/luser/bin/renderheadless" ]

$ docker build --target vulkan-sample-run -t localhost:vulkan-sample-run .
...
$ docker run --runtime=nvidia --gpus all --name vulkan localhost:vulkan-sample-run
Running headless rendering example
GPU: NVIDIA GeForce GTX 1650 with Max-Q Design
Framebuffer image saved to headless.ppm
Finished. Press enter to terminate...
...
$ docker cp vulkan:/home/luser/headless.ppm .
...
$ eog headless.ppm
...
$ docker run --runtime=nvidia --gpus all --rm localhost:vulkan-sample-run vulkaninfo --summary | grep deviceName
...
	deviceName         = NVIDIA GeForce GTX 1650 with Max-Q Design

三、cuda镜像

参考:

https://blog.csdn.net/FL1623863129/article/details/132275060

四、运行命令

1、宿主机执行

生成 NVIDIA Container Device Interface (CDI) 配置文件

复制代码
nvidia-ctk cdi generate 2>nvidia-ctk_cdi_generate.log > nvidia.yaml

2、拉取镜像

复制代码
docker pull nvcr.io/nvidia/cuda:12.4.0-runtime-ubuntu22.04

3、启动容器

只有一个gpu则改为gpus all,然后删掉所有--device

复制代码
docker run --name=airsim-env \
      --hostname=airsim-env \
      --mac-address=1g:ad:9e:c5:a2:b8 \
      --network=bridge \
      --runtime=nvidia \
      --shm-size=200g \
      --gpus '"device=3"' \
      --device /dev/nvidia3 \
      --device /dev/nvidiactl \
      --device /dev/nvidia-uvm \
      --device /dev/nvidia-uvm-tools \
      --ipc=host \
      --pid=host \
      --privileged \
      -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics,display \
      -e VK_ICD_FILENAMES=/usr/share/glvnd/egl_vendor.d/10_nvidia.json \
      -e XDG_RUNTIME_DIR=/tmp \
      -t \
      nvcr.io/nvidia/cuda:12.4.0-runtime-ubuntu22.04 \
      bash

解释:

--gpus all与--runtime=nvidia同时使用以支持挂载vulkan ICD

-e NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics,display 以启用图形能力

-e VK_ICD_FILENAMES=/usr/share/glvnd/egl_vendor.d/10_nvidia.json 指定Vulkan ICD

-e XDG_RUNTIME_DIR=/tmp 设置运行时目录

4、进入容器

复制代码
docker exec -it airsim-env bash

5、安装依赖

安装 Vulkan 和图形依赖

复制代码
apt update
  apt install -y \
      vulkan-tools \
      libgl1-mesa-glx \
      libgl1-mesa-dri \
      libegl1 \
      libxext6 \
      xvfb

6、验证

验证 Vulkan 是否识别 GPU

复制代码
vulkaninfo --summary | grep -E "(deviceName|driverName)"

成功识别到GPU

相关推荐
炸炸鱼.7 小时前
Kubernetes高级调度02:Taint/Toleration、Cordon/Drain、亲和性与反亲和性完全指南
云原生·容器·kubernetes
wanhengidc7 小时前
服务器租用有何优点
运维·服务器·安全·web安全
ZGi.ai8 小时前
人工审查节点:让自动化工作流多一步人工把关
运维·人工智能·自动化·人机协同·智能体工作流·人工审查
艾莉丝努力练剑8 小时前
【Linux:文件】Ext系列文件系统进阶
linux·运维·服务器·c++·文件系统·文件io·ext
海市公约8 小时前
Linux核心基础命令与权限管理实战指南
linux·运维·服务器·vim·权限管理·系统监控·命令行
wkd_0078 小时前
Ubuntu 22.04 Samba 连接故障排查记:从“用户名或密码错误”到 NTLM 版本不兼容
linux·运维·ubuntu
企服AI产品测评局8 小时前
Agent适配信创环境实测:企业级自动化如何实现国产操作系统与数据库全兼容?
运维·数据库·人工智能·ai·chatgpt·自动化
mixboot9 小时前
Linux 进程工作目录查看利器:pwdx 命令详解
linux·运维·服务器
盖小雅10 小时前
自动化排班如何破解劳动法合规难题:从规则冲突到可追溯的排班表
大数据·运维·机器学习·自动化
NiceCloud喜云10 小时前
Claude Code Routines 实战:三种触发器跑通云端自动化编码
android·运维·数据库·人工智能·自动化·json·飞书