0 显卡驱动
我的操作系统是ubuntu 24.04
bash
sudo apt-get remove --purge nvidia*
安装依赖
bash
sudo apt-get update
sudo apt-get install g++
sudo apt-get install gcc
sudo apt-get install make
确保lsmod | grep nouveau没有输出
按照NVIDIA官网更新最新驱动,采用网络安装的方式,也就是Network Repository enabled(amd64)
● 使用Distribution: ubuntu2404
● 使用Network Repository Enabled (amd64)
● 使用Selecting a Branch or a Specific Driver version:590
● 使用Proprietary Kernel Modules
● 执行Post-installation Actions
1 安装docker和compose
假如系统里面有以前的ubuntu发行版本自带的docker
- 备份配置(强烈建议)
bash
sudo cp -a /etc/docker/daemon.json /root/daemon.json.bak
- 停 Docker,并卸载 Ubuntu 自带 docker.io + compose 1.x(不要 purge)
bash
sudo systemctl stop docker
sudo systemctl stop docker.socket
sudo systemctl stop docker.service
sudo systemctl disable --now docker.socket
确认
bash
systemctl is-active docker.service docker.socket
看到都是 inactive/disabled 就可以继续卸载命令:
bash
sudo apt remove -y docker.io docker-compose python3-compose
sudo apt purge -y docker.io
- 添加 Docker CE 仓库(阿里云镜像,keyring 方式)
解释一下,这里的阿里云镜像是安装过程的加速,不是docker pull里面的加速镜像
bash
sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" \
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
- 安装 Docker CE + Compose Plugin
bash
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
- 启动并验证(确认还是用你的 data-root)
bash
sudo systemctl enable --now docker
docker --version
docker compose version
docker info | grep 'Docker Root Dir'
docker ps -a
docker images
你期望看到:Docker Root Dir: 自己指定的哪个
如果这里不是这个路径,优先检查自己定的路径否还在;必要时用备份恢复:
bash
sudo cp -a /root/daemon.json.bak /etc/docker/daemon.json
sudo systemctl restart docker
另外补充一下,docker的配置在 /etc/docker/daemon.json 里面,可以vim编辑,比如data_root这个字段就是你的所有docker数据,比如镜像存储的地方,可以自定义。如果要配置docker pull的加速,可以添加
bash
"registry-mirrors": [
"https://docker.m.daocloud.io",
"https://ccr.ccs.tencentyun.com",
"https://cr.console.aliyun.com"
]
完整例子是
bash
{
"data-root": "/mnt/local_ssd_550/dockers",
"default-runtime": "nvidia",
"default-shm-size": "64G",
"registry-mirrors": [
"https://docker.m.daocloud.io",
"https://ccr.ccs.tencentyun.com",
"https://cr.console.aliyun.com"
],
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
- 如果日后要更新
bash
sudo apt update
sudo apt install --only-upgrade -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo systemctl restart docker
docker --version
docker compose version
2 安装nvidia container toolkit
- 安装前置工具
bash
sudo apt-get update && sudo apt-get install -y --no-install-recommends curl gnupg2
- 添加 GPG 密钥
bash
curl -fsSL https://mirrors.ustc.edu.cn/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
- 生成仓库列表(将 nvidia.github.io 替换为 mirrors.ustc.edu.cn)
bash
curl -s -L https://mirrors.ustc.edu.cn/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://nvidia.github.io#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://mirrors.ustc.edu.cn#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
- 更新软件包列表
bash
sudo apt-get update
- 安装 NVIDIA Container Toolkit(指定版本)
bash
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.18.2-1
sudo apt-get install -y \
nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
如果想用最新版本
bash
sudo apt-get install -y nvidia-container-toolkit
然后重启
bash
sudo systemctl restart docker
- 配置 Docker
bash
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
- 验证 GPU 支持
bash
docker run -it --rm --gpus all ubuntu nvidia-smi