软件安装-在ubuntu24安装nvidia driver和cuda toolkit

需求

在ubuntu24操作系统部署nvidia driver和cuda toolkit。任意版本的安装。

思路

下载文件,部署软件,测试可用性。

方法

方法1

部署nvidia driver,部署cuda,测试。

优点:nvidia承诺了driver的稳定性,长期稳定可靠。

缺点:cuda不是最新版,可能低一个主版本。需分两个软件部署。偶发cuda找不到nvidia driver,需人工修复。

对nvidia稳定性强需求,对cuda不追新且许可版本宽泛的可考虑这个方法。

方法2

部署cuda,顺带部署nvidia driver,测试。

优点:一次部署获得所有,可用最新版本cuda,指定版本的cuda。

缺点:cuda适配的nvidia driver可能会不是认证的稳定版本,可用但不承诺长期稳定性。

对cuda追新或者强制某个版本的选这个方法。

方法3

受限国内网络影响,本文不写该方法。

联网部署,需要良好的国内外网络,操作简单,稳定且可很好的支持跨平台开发。

参考以下文档

bash 复制代码
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#
https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/index.html#

操作记录

检测GPU设备

可复制执行以下命令,执行sh文件,获取GPU信息。

bash 复制代码
cat > check_GPU.sh << EOF
#!/bin/bash

# 定义日志文件路径
LOG_FILE="gpu_detection_$(date +%Y%m%d_%H%M%S).log"

# 创建日志文件并设置权限
touch "$LOG_FILE"
chmod 644 "$LOG_FILE"

# 输出时间戳到日志
echo "检测时间: $(date)" >> "$LOG_FILE"
echo "==================================================" >> "$LOG_FILE"

# 检测GPU厂商和型号
echo "正在检测GPU厂商和型号..." >> "$LOG_FILE"
GPU_VENDOR=$(lspci | grep -i "vga\|3d\|display" | awk -F': ' '{print $2}' | head -1)
echo "GPU厂商和型号: $GPU_VENDOR" >> "$LOG_FILE"

# 检测GPU数量
echo "正在检测GPU设备数量..." >> "$LOG_FILE"
GPU_COUNT=$(lspci | grep -i "vga\|3d\|display" | wc -l)
echo "GPU设备数量: $GPU_COUNT" >> "$LOG_FILE"

# 检测nvidia-smi
echo "正在检测nvidia-smi..." >> "$LOG_FILE"
if command -v nvidia-smi &>/dev/null; then
    NVIDIA_SMI_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader | head -1)
    echo "nvidia-smi已安装,版本: $NVIDIA_SMI_VERSION" >> "$LOG_FILE"
    
    # 获取CUDA版本
    CUDA_VERSION=$(nvidia-smi | grep "CUDA Version" | awk '{print $NF}')
    echo "CUDA版本: $CUDA_VERSION" >> "$LOG_FILE"
else
    echo "nvidia-smi未安装" >> "$LOG_FILE"
    
    # 尝试通过nvcc检测CUDA
    echo "正在尝试通过nvcc检测CUDA..." >> "$LOG_FILE"
    if command -v nvcc &>/dev/null; then
        CUDA_VERSION=$(nvcc --version | grep release | awk '{print $5}' | cut -c2-)
        echo "CUDA已安装,版本: $CUDA_VERSION" >> "$LOG_FILE"
    else
        echo "CUDA未安装" >> "$LOG_FILE"
    fi
fi

echo "==================================================" >> "$LOG_FILE"
echo "检测完成,结果已保存到 $LOG_FILE"

# 显示日志文件内容
cat "$LOG_FILE"    
EOF

下载软件

选择合适版本的方法

注意:先选版本,后下软件,不放心就多下载几个主版本,子版本可选最新。

浏览器访问https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id7, 下图所示,可以根据设备驱动选cuda,也可根据cuda选设备驱动。

下载nvidia driver

浏览器访问https://www.nvidia.cn/drivers/lookup/, 输入GPU的设备名字,选择linux-64bit,查询GPU适配的驱动,

推荐选择认证的驱动,稳定可靠,

下载cuda

浏览器访问https://developer.nvidia.com/cuda-toolkit,

勾选配置,

用软件下载工具或浏览器访问该地址下载文件,

下载测试专用文件

浏览器访问https://github.com/NVIDIA/cuda-samples/tags,下图所示,根据cuda版本选择对应文件。

上传和授权

将下载的文件上传到服务器。

给sh和run文件授权

bash 复制代码
chmod +x *.sh
chmod +x *.run

解压文件

bash 复制代码
unzip *.zip
tar -zxf *.tar.gz

安装

安装基础软件

bash 复制代码
apt update -y && apt install -y gcc g++ make cmake

方法1

安装nvidia driver
bash 复制代码
./NVIDIA-Linux-x86_64-570.172.08.run

其他都选确认,都选yes。 安装结束后,务必reboot重启系统。 重启后查询安装结果,

bash 复制代码
root@testserver1 Fri Jul 18 [03:32:10] : ~# nvidia-smi
Fri Jul 18 03:32:14 2025       +-----------------------------------------------------------------------------------------+| NVIDIA-SMI 570.172.08             Driver Version: 570.172.08     CUDA Version: 12.8     ||-----------------------------------------+------------------------+----------------------+| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC || Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. ||                                         |                        |               MIG M. ||=========================================+========================+======================||   0  Quadro RTX 5000                Off |   00000000:2D:00.0 Off |                  Off || 33%   51C    P0             58W /  230W |       0MiB /  16384MiB |      7%      Default ||                                         |                        |                  N/A |+-----------------------------------------+------------------------+----------------------+                                                                                         +-----------------------------------------------------------------------------------------+| Processes:                                                                              ||  GPU   GI   CI              PID   Type   Process name                        GPU Memory ||        ID   ID                                                               Usage      ||=========================================================================================||  No running processes found                                                             |+-----------------------------------------------------------------------------------------+  |

可以看到CUDA Version: 12.8,表示当前设备最高只能安装cuda12.8版本,比如cuda12.8.1或者12.8.9都可以。

安装cuda
bash 复制代码
./cuda_12.8.1_570.124.06_linux.run

此处取消勾选driver 其他都选确认,安装结束可见以下信息

bash 复制代码
root@testserver1 Fri Jul 18 [05:00:56] : /opt/nvidia# ./cuda_12.8.1_570.124.06_linux.run
============ Summary ============ Driver:   Not SelectedToolkit:  Installed in /usr/local/cuda-12.8/ Please make sure that -   PATH includes /usr/local/cuda-12.8/bin -   LD_LIBRARY_PATH includes /usr/local/cuda-12.8/lib64, or, add /usr/local/cuda-12.8/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.8/bin***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 570.00 is required for CUDA 12.8 functionality to work.To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:    sudo <CudaInstaller>.run --silent --driver Logfile is /var/log/cuda-installer.log 

配置环境变量

bash 复制代码
echo 'export PATH=/usr/local/cuda-12.8/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source /root/.bashrc 

安装后nvcc --version查询,例如,

bash 复制代码
root@testserver1 Fri Jul 18 [07:49:03] : /opt/nvidia
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Tue_May_27_02:21:03_PDT_2025
Cuda compilation tools, release 12.9, V12.9.86
Build cuda_12.9.r12.9/compiler.36037853_0

推荐安装后 reboot 重启操作系统

方法2

一键安装
bash 复制代码
./cuda_12.9.1_575.57.08_linux.run

如图所示默认全选 安装结束后,可见如下信息,

bash 复制代码
root@testserver1 Fri Jul 18 [07:13:29] : /opt/nvidia# ./cuda_12.9.1_575.57.08_linux.run
============ Summary ============ Driver:   InstalledToolkit:  Installed in /usr/local/cuda-12.9/ Please make sure that -   PATH includes /usr/local/cuda-12.9/bin -   LD_LIBRARY_PATH includes /usr/local/cuda-12.9/lib64, or, add /usr/local/cuda-12.9/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.9/binTo uninstall the NVIDIA Driver, run nvidia-uninstallLogfile is /var/log/cuda-installer.log |
配置环境变量

记得先在~/.bashrc 注释掉旧的驱动路径配置,避免配置冲突。

bash 复制代码
echo 'export PATH=/usr/local/cuda-12.9/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.9/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source /root/.bashrc

安全一些reboot重启系统。

测试

基础测试

可见有查询结果即可。

bash 复制代码
nvidia-smi
nvcc --version

增量测试

编译文件

bash 复制代码
root@testserver1 Fri Jul 18 [05:38:17] : cuda-samples-12.8/Samples/1_Utilities/deviceQuery# cmake ./
root@testserver1 Fri Jul 18 [05:38:17] : cuda-samples-12.8/Samples/1_Utilities/deviceQuery# make

执行检测./deviceQuery,可见如下结果,Result = PASS表示设备环境检测通过。

bash 复制代码
root@testserver1 Fri Jul 18 [05:41:53] : /opt/nvidia/cuda-samples-12.8/Samples/1_Utilities/deviceQuery# ./deviceQuery
./deviceQuery Starting...  CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Quadro RTX 5000"  CUDA Driver Version / Runtime Version          12.8 / 12.8  CUDA Capability Major/Minor version number:    7.5  Total amount of global memory:                 15928 MBytes (16701652992 bytes)  (048) Multiprocessors, (064) CUDA Cores/MP:    3072 CUDA Cores  GPU Max Clock rate:                            1815 MHz (1.81 GHz)  Memory Clock rate:                             7001 Mhz  Memory Bus Width:                              256-bit  L2 Cache Size:                                 4194304 bytes  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers  Total amount of constant memory:               65536 bytes  Total amount of shared memory per block:       49152 bytes  Total shared memory per multiprocessor:        65536 bytes  Total number of registers available per block: 65536  Warp size:                                     32  Maximum number of threads per multiprocessor:  1024  Maximum number of threads per block:           1024  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)  Maximum memory pitch:                          2147483647 bytes  Texture alignment:                             512 bytes  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)  Run time limit on kernels:                     No  Integrated GPU sharing Host Memory:            No  Support host page-locked memory mapping:       Yes  Alignment requirement for Surfaces:            Yes  Device has ECC support:                        Disabled  Device supports Unified Addressing (UVA):      Yes  Device supports Managed Memory:                Yes  Device supports Compute Preemption:            Yes  Supports Cooperative Kernel Launch:            Yes  Supports MultiDevice Co-op Kernel Launch:      Yes  Device PCI Domain ID / Bus ID / location ID:   0 / 45 / 0  Compute Mode:     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.8, CUDA Runtime Version = 12.8, NumDevs = 1Result = PASS 

卸载

卸载后记得注释掉/root/.bashrc配置过的环境变量。 卸载后务必reboot重启系统

卸载nvidia driver

bash 复制代码
/usr/bin/nvidia-uninstall

卸载cuda

bash 复制代码
/usr/local/cuda-12.8/bin/cuda-uninstaller

卸载成功可见以下信息,

bash 复制代码
root@testserver1 Fri Jul 18 [06:10:34] : /opt/nvidia

# /usr/local/cuda-12.8/bin/cuda-uninstaller

 Successfully uninstalled
相关推荐
羊八井8 天前
使用 Earth2Studio 和 AI 模型进行全球天气预测:太阳辐照
pytorch·python·nvidia
神也佑我橙橙9 天前
Ubuntu 22.04 安装英伟达驱动
linux·ubuntu·nvidia
cnbestec1 个月前
UR机器人解锁关节扭矩控制:利用英伟达Isaac Lab框架,推动装配自动化的Sim2Real迁移
机器人·nvidia·协作机器人·优傲机器人·关节扭矩控制·ur机器人
nuczzz1 个月前
GPU虚拟化
docker·kubernetes·k8s·gpu·nvidia
乌恩大侠2 个月前
【东枫科技】使用LabVIEW进行NVIDIA CUDA GPU 开发
人工智能·科技·labview·nvidia·usrp
yangtzi2 个月前
Windows Server 2025开启GPU分区(GPU-P)部署DoraCloud云桌面
nvidia·hyper-v·gpu-p·doracloud
乌恩大侠2 个月前
【东枫科技】代理销售 NVIDIA DGX Spark 您的桌上有一台 Grace Blackwell AI 超级计算机。
大数据·人工智能·科技·spark·nvidia
乌恩大侠2 个月前
【东枫科技】代理英伟达产品:DPU
人工智能·科技·nvidia·6g·usrp
乌恩大侠2 个月前
【东枫科技】AMD / Xilinx Alveo™ V80计算加速器卡
人工智能·科技·5g·nvidia·6g·usrp