需求
在ubuntu24操作系统部署nvidia driver和cuda toolkit。任意版本的安装。
思路
下载文件,部署软件,测试可用性。
方法
方法1
部署nvidia driver,部署cuda,测试。
优点:nvidia承诺了driver的稳定性,长期稳定可靠。
缺点:cuda不是最新版,可能低一个主版本。需分两个软件部署。偶发cuda找不到nvidia driver,需人工修复。
对nvidia稳定性强需求,对cuda不追新且许可版本宽泛的可考虑这个方法。
方法2
部署cuda,顺带部署nvidia driver,测试。
优点:一次部署获得所有,可用最新版本cuda,指定版本的cuda。
缺点:cuda适配的nvidia driver可能会不是认证的稳定版本,可用但不承诺长期稳定性。
对cuda追新或者强制某个版本的选这个方法。
方法3
受限国内网络影响,本文不写该方法。
联网部署,需要良好的国内外网络,操作简单,稳定且可很好的支持跨平台开发。
参考以下文档
bash
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#
https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/index.html#
操作记录
检测GPU设备
可复制执行以下命令,执行sh文件,获取GPU信息。
bash
cat > check_GPU.sh << EOF
#!/bin/bash
# 定义日志文件路径
LOG_FILE="gpu_detection_$(date +%Y%m%d_%H%M%S).log"
# 创建日志文件并设置权限
touch "$LOG_FILE"
chmod 644 "$LOG_FILE"
# 输出时间戳到日志
echo "检测时间: $(date)" >> "$LOG_FILE"
echo "==================================================" >> "$LOG_FILE"
# 检测GPU厂商和型号
echo "正在检测GPU厂商和型号..." >> "$LOG_FILE"
GPU_VENDOR=$(lspci | grep -i "vga\|3d\|display" | awk -F': ' '{print $2}' | head -1)
echo "GPU厂商和型号: $GPU_VENDOR" >> "$LOG_FILE"
# 检测GPU数量
echo "正在检测GPU设备数量..." >> "$LOG_FILE"
GPU_COUNT=$(lspci | grep -i "vga\|3d\|display" | wc -l)
echo "GPU设备数量: $GPU_COUNT" >> "$LOG_FILE"
# 检测nvidia-smi
echo "正在检测nvidia-smi..." >> "$LOG_FILE"
if command -v nvidia-smi &>/dev/null; then
NVIDIA_SMI_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader | head -1)
echo "nvidia-smi已安装,版本: $NVIDIA_SMI_VERSION" >> "$LOG_FILE"
# 获取CUDA版本
CUDA_VERSION=$(nvidia-smi | grep "CUDA Version" | awk '{print $NF}')
echo "CUDA版本: $CUDA_VERSION" >> "$LOG_FILE"
else
echo "nvidia-smi未安装" >> "$LOG_FILE"
# 尝试通过nvcc检测CUDA
echo "正在尝试通过nvcc检测CUDA..." >> "$LOG_FILE"
if command -v nvcc &>/dev/null; then
CUDA_VERSION=$(nvcc --version | grep release | awk '{print $5}' | cut -c2-)
echo "CUDA已安装,版本: $CUDA_VERSION" >> "$LOG_FILE"
else
echo "CUDA未安装" >> "$LOG_FILE"
fi
fi
echo "==================================================" >> "$LOG_FILE"
echo "检测完成,结果已保存到 $LOG_FILE"
# 显示日志文件内容
cat "$LOG_FILE"
EOF
下载软件
选择合适版本的方法
注意:先选版本,后下软件,不放心就多下载几个主版本,子版本可选最新。
浏览器访问https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id7
, 下图所示,可以根据设备驱动选cuda,也可根据cuda选设备驱动。
下载nvidia driver
浏览器访问https://www.nvidia.cn/drivers/lookup/
, 输入GPU的设备名字,选择linux-64bit,查询GPU适配的驱动,

推荐选择认证的驱动,稳定可靠,

下载cuda
浏览器访问https://developer.nvidia.com/cuda-toolkit
,
勾选配置,

用软件下载工具或浏览器访问该地址下载文件,

下载测试专用文件
浏览器访问https://github.com/NVIDIA/cuda-samples/tags
,下图所示,根据cuda版本选择对应文件。

上传和授权
将下载的文件上传到服务器。
给sh和run文件授权
bash
chmod +x *.sh
chmod +x *.run
解压文件
bash
unzip *.zip
tar -zxf *.tar.gz
安装
安装基础软件
bash
apt update -y && apt install -y gcc g++ make cmake
方法1
安装nvidia driver
bash
./NVIDIA-Linux-x86_64-570.172.08.run

其他都选确认,都选yes。 安装结束后,务必
reboot
重启系统。 重启后查询安装结果,
bash
root@testserver1 Fri Jul 18 [03:32:10] : ~# nvidia-smi
Fri Jul 18 03:32:14 2025 +-----------------------------------------------------------------------------------------+| NVIDIA-SMI 570.172.08 Driver Version: 570.172.08 CUDA Version: 12.8 ||-----------------------------------------+------------------------+----------------------+| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. || | | MIG M. ||=========================================+========================+======================|| 0 Quadro RTX 5000 Off | 00000000:2D:00.0 Off | Off || 33% 51C P0 58W / 230W | 0MiB / 16384MiB | 7% Default || | | N/A |+-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=========================================================================================|| No running processes found |+-----------------------------------------------------------------------------------------+ |
可以看到CUDA Version: 12.8,表示当前设备最高只能安装cuda12.8版本,比如cuda12.8.1或者12.8.9都可以。
安装cuda
bash
./cuda_12.8.1_570.124.06_linux.run
此处取消勾选driver
其他都选确认,安装结束可见以下信息
bash
root@testserver1 Fri Jul 18 [05:00:56] : /opt/nvidia# ./cuda_12.8.1_570.124.06_linux.run
============ Summary ============ Driver: Not SelectedToolkit: Installed in /usr/local/cuda-12.8/ Please make sure that - PATH includes /usr/local/cuda-12.8/bin - LD_LIBRARY_PATH includes /usr/local/cuda-12.8/lib64, or, add /usr/local/cuda-12.8/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.8/bin***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 570.00 is required for CUDA 12.8 functionality to work.To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file: sudo <CudaInstaller>.run --silent --driver Logfile is /var/log/cuda-installer.log
配置环境变量
bash
echo 'export PATH=/usr/local/cuda-12.8/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source /root/.bashrc
安装后nvcc --version
查询,例如,
bash
root@testserver1 Fri Jul 18 [07:49:03] : /opt/nvidia
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Tue_May_27_02:21:03_PDT_2025
Cuda compilation tools, release 12.9, V12.9.86
Build cuda_12.9.r12.9/compiler.36037853_0
推荐安装后 reboot
重启操作系统
方法2
一键安装
bash
./cuda_12.9.1_575.57.08_linux.run

如图所示默认全选 安装结束后,可见如下信息,
bash
root@testserver1 Fri Jul 18 [07:13:29] : /opt/nvidia# ./cuda_12.9.1_575.57.08_linux.run
============ Summary ============ Driver: InstalledToolkit: Installed in /usr/local/cuda-12.9/ Please make sure that - PATH includes /usr/local/cuda-12.9/bin - LD_LIBRARY_PATH includes /usr/local/cuda-12.9/lib64, or, add /usr/local/cuda-12.9/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.9/binTo uninstall the NVIDIA Driver, run nvidia-uninstallLogfile is /var/log/cuda-installer.log |
配置环境变量
记得先在~/.bashrc 注释掉旧的驱动路径配置,避免配置冲突。
bash
echo 'export PATH=/usr/local/cuda-12.9/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.9/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source /root/.bashrc
安全一些reboot
重启系统。
测试
基础测试
可见有查询结果即可。
bash
nvidia-smi
nvcc --version
增量测试
编译文件
bash
root@testserver1 Fri Jul 18 [05:38:17] : cuda-samples-12.8/Samples/1_Utilities/deviceQuery# cmake ./
root@testserver1 Fri Jul 18 [05:38:17] : cuda-samples-12.8/Samples/1_Utilities/deviceQuery# make
执行检测./deviceQuery
,可见如下结果,Result = PASS表示设备环境检测通过。
bash
root@testserver1 Fri Jul 18 [05:41:53] : /opt/nvidia/cuda-samples-12.8/Samples/1_Utilities/deviceQuery# ./deviceQuery
./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Quadro RTX 5000" CUDA Driver Version / Runtime Version 12.8 / 12.8 CUDA Capability Major/Minor version number: 7.5 Total amount of global memory: 15928 MBytes (16701652992 bytes) (048) Multiprocessors, (064) CUDA Cores/MP: 3072 CUDA Cores GPU Max Clock rate: 1815 MHz (1.81 GHz) Memory Clock rate: 7001 Mhz Memory Bus Width: 256-bit L2 Cache Size: 4194304 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total shared memory per multiprocessor: 65536 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1024 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 3 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 45 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.8, CUDA Runtime Version = 12.8, NumDevs = 1Result = PASS
卸载
卸载后记得注释掉/root/.bashrc
配置过的环境变量。 卸载后务必reboot
重启系统
卸载nvidia driver
bash
/usr/bin/nvidia-uninstall
卸载cuda
bash
/usr/local/cuda-12.8/bin/cuda-uninstaller
卸载成功可见以下信息,
bash
root@testserver1 Fri Jul 18 [06:10:34] : /opt/nvidia
# /usr/local/cuda-12.8/bin/cuda-uninstaller
Successfully uninstalled