cuda from 11.4 to 12.2 install readme

步骤 1:彻底清理旧版 CUDA + 驱动(核心,避免冲突)

bash 复制代码
# 1. 卸载旧驱动+CUDA(apt方式)

sudo apt-get --purge remove "*nvidia*" "*cuda*" "*cudnn*" -y

# 2. 手动删除驱动残留(470版本专属)

sudo rm -rf /usr/lib/x86_64-linux-gnu/libnvidia*

sudo rm -rf /usr/lib/x86_64-linux-gnu/libcuda*

sudo rm -rf /usr/local/nvidia*

3. 删除CUDA 11.4目录

bash 复制代码
sudo rm -rf /usr/local/cuda-11.4

sudo rm -rf /usr/local/cuda # 清除软链接

4. 清理依赖+缓存

bash 复制代码
sudo apt autoremove -y

sudo apt clean

步骤 2:禁用 nouveau 驱动(470 版本升级必做)

1. 写入黑名单

bash 复制代码
sudo tee /etc/modprobe.d/blacklist-nouveau.conf << EOF blacklist nouveau options nouveau modeset=0 EOF

2. 生效配置(关键:重建initramfs)

bash 复制代码
sudo update-initramfs -u

3. 验证禁用(无输出则成功)

bash 复制代码
lsmod | grep nouveau

过程中卸载不干净,必须清理干净,否则新版本装不上

(base) root@ubuntu:/data/ghf# sudo ./cuda_12.2.0_535.54.03_linux.run --silent --driver --toolkit

Existing package manager installation of the driver found.

It is strongly recommended that you remove this before continuing.

Override this check by passing --override-driver-check

你遇到的报错是因为系统中仍有通过 apt 安装的 NVIDIA 驱动残留 (即使之前执行了清理,仍有隐蔽的包管理器残留),.run 安装包检测到后阻止了驱动升级

继续清理残留文件:

步骤 1:查找并删除所有 apt 安装的 NVIDIA 包

bash 复制代码
# 1. 列出所有NVIDIA相关的apt包(找到残留)
dpkg -l | grep nvidia

(base) root@ubuntu:/data/ghf# dpkg -l | grep nvidia ii libnvidia-cfg1-470:amd64 470.129.06-0ubuntu0.20.04.1 amd64 NVIDIA binary OpenGL/GLX configuration library ii libnvidia-common-470 470.129.06-0ubuntu0.20.04.1 all Shared files used by the NVIDIA libraries rc libnvidia-compute-440:amd64 450.119.03-0ubuntu0.20.04.1 amd64 Transitional package for libnvidia-compute-450 ii libnvidia-compute-470:amd64 470.129.06-0ubuntu0.20.04.1 amd64 NVIDIA libcompute package ii libnvidia-decode-470:amd64 470.129.06-0ubuntu0.20.04.1 amd64 NVIDIA Video Decoding runtime libraries ii libnvidia-encode-470:amd64 470.129.06-0ubuntu0.20.04.1 amd64 NVENC Video Encoding runtime library ii libnvidia-extra-470:amd64 470.129.06-0ubuntu0.20.04.1 amd64 Extra libraries for the NVIDIA driver ii libnvidia-fbc1-470:amd64 470.129.06-0ubuntu0.20.04.1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library

ii libnvidia-gl-470:amd64 470.129.06-0ubuntu0.20.04.1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD

ii libnvidia-ifr1-470:amd64 470.129.06-0ubuntu0.20.04.1 amd64 NVIDIA OpenGL-based Inband Frame Readback runtime library

ii nvidia-compute-utils-470 470.129.06-0ubuntu0.20.04.1 amd64 NVIDIA compute utilities

ii nvidia-cuda-dev 9.1.85-3ubuntu1 amd64 NVIDIA CUDA development files

ii nvidia-cuda-doc 9.1.85-3ubuntu1 all NVIDIA CUDA and OpenCL documentation

ii nvidia-cuda-gdb 9.1.85-3ubuntu1 amd64 NVIDIA CUDA Debugger (GDB)

ii nvidia-cuda-toolkit 9.1.85-3ubuntu1 amd64 NVIDIA CUDA development toolkit

ii nvidia-dkms-470 470.129.06-0ubuntu0.20.04.1 amd64 NVIDIA DKMS package

ii nvidia-driver-440 450.119.03-0ubuntu0.20.04.1 amd64 Transitional package for nvidia-driver-450 ii nvidia-driver-450 460.91.03-0ubuntu0.20.04.1 amd64 Transitional package for nvidia-driver-460

ii nvidia-driver-460 470.129.06-0ubuntu0.20.04.1 amd64 Transitional package for nvidia-driver-470

ii nvidia-driver-470 470.129.06-0ubuntu0.20.04.1 amd64 NVIDIA driver metapackage

ii nvidia-kernel-common-470 470.129.06-0ubuntu0.20.04.1 amd64 Shared files used with the kernel module

ii nvidia-kernel-source-470 470.129.06-0ubuntu0.20.04.1 amd64 NVIDIA kernel source package

ii nvidia-opencl-dev:amd64 9.1.85-3ubuntu1 amd64 NVIDIA OpenCL development files

ii nvidia-prime 0.8.16~0.18.04.1 all Tools to enable NVIDIA's Prime

ii nvidia-profiler 9.1.85-3ubuntu1 amd64 NVIDIA Profiler for CUDA and OpenCL

ii nvidia-settings 470.57.01-0ubuntu0.20.04.1 amd64 Tool for configuring the NVIDIA graphics driver

ii nvidia-utils-470 470.129.06-0ubuntu0.20.04.1 amd64 NVIDIA driver support binaries

ii nvidia-visual-profiler 9.1.85-3ubuntu1 amd64 NVIDIA Visual Profiler for CUDA and OpenCL

ii screen-resolution-extra 0.18build1 all Extension for the nvidia-settings control panel

ii xserver-xorg-video-nvidia-470 470.129.06-0ubuntu0.20.04.1 amd64 NVIDIA binary Xorg driver

彻底卸载所有 NVIDIA/CUDA apt 包(精准匹配)

步骤 1:卸载 NVIDIA 470 驱动相关包(核心)

bash 复制代码
# 批量卸载所有470版本的NVIDIA包
sudo apt-get --purge remove \
libnvidia-cfg1-470:amd64 \
libnvidia-common-470 \
libnvidia-compute-470:amd64 \
libnvidia-decode-470:amd64 \
libnvidia-encode-470:amd64 \
libnvidia-extra-470:amd64 \
libnvidia-fbc1-470:amd64 \
libnvidia-gl-470:amd64 \
libnvidia-ifr1-470:amd64 \
nvidia-compute-utils-470 \
nvidia-dkms-470 \
nvidia-driver-470 \
nvidia-kernel-common-470 \
nvidia-kernel-source-470 \
nvidia-utils-470 \
xserver-xorg-video-nvidia-470 -y

步骤 2:卸载过渡包 + 旧版 CUDA 9.1 包

bash 复制代码
# 卸载440/450/460过渡包+CUDA 9.1相关包
sudo apt-get --purge remove \
libnvidia-compute-440:amd64 \
nvidia-driver-440 \
nvidia-driver-450 \
nvidia-driver-460 \
nvidia-cuda-dev \
nvidia-cuda-doc \
nvidia-cuda-gdb \
nvidia-cuda-toolkit \
nvidia-opencl-dev:amd64 \
nvidia-profiler \
nvidia-visual-profiler -y

步骤 3:卸载其他 NVIDIA 辅助包

bash 复制代码
sudo apt-get --purge remove nvidia-prime nvidia-settings screen-resolution-extra -y

步骤 4:清理残留依赖 + 配置文件

bash 复制代码
# 清理自动安装的依赖
sudo apt-get autoremove -y

# 清理缓存
sudo apt-get autoclean -y

# 强制清理残留的dpkg配置文件
sudo dpkg --purge $(dpkg -l | grep 'nvidia' | awk '{print $2}') 2>/dev/null

(base) root@ubuntu:/data/ghf# lsmod | grep nvidia

nvidia_uvm 1036288 8

nvidia_drm 57344 4

nvidia_modeset 1200128 2

nvidia_drm nvidia 35340288 292

nvidia_uvm,nvidia_modeset

drm_kms_helper 184320 4 cirrus,nvidia_drm drm 491520 10 drm_kms_helper,nvidia,cirrus,nvidia_drm

仍然有没清掉的!!!!

bash 复制代码
# 杀死所有CUDA/PyTorch相关进程

sudo pkill -9 nvidia-smi
sudo pkill -9 nvcc

# 这里面杀死python 需要谨慎
#
# 先查看所有Python进程(确认要终止的PID)
ps aux | grep python

## 没问题再进行全杀
sudo pkill -9 python

ps aux | grep python 输出内容如下:

systemd+ 3517887 0.0 0.0 26468 20692 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3517889 0.0 0.0 26468 20676 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3517896 0.0 0.0 26468 20580 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3517934 0.0 0.0 26468 20656 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3517945 0.0 0.0 26468 20644 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3517948 0.0 0.0 26468 20696 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3517954 0.0 0.0 26468 20652 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3517960 0.0 0.0 26468 20672 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3518010 0.0 0.0 26468 20744 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3518011 0.0 0.0 26468 20664 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3518019 0.0 0.0 26468 20656 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3518020 0.0 0.0 26468 20600 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3518021 0.0 0.0 26468 20704 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3518022 0.0 0.0 26468 20676 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3518024 0.0 0.0 26468 20664 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3518025 0.0 0.0 26468 20660 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3518033 0.0 0.0 26468 20620 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3518034 0.0 0.0 26468 20576 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3518035 0.0 0.0 26468 20656 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py systemd+ 3518038 0.0 0.0 26468 20684 ? S Mar17 0:04 python3 -u /app/src/pool/worker.py

这些python 进程,我发现是服务器fastgpt 的docker compose启动后的运行进程;

bash 复制代码
# 进入fastgpt的目录,执行:
docker compose down

1. 杀死所有python/python3进程(覆盖所有版本)

bash 复制代码
# 1. 杀死所有python/python3进程(覆盖所有版本)
sudo pkill -9 python
sudo pkill -9 python3
sudo pkill -9 ipykernel_launcher
sudo pkill -9 jupyter-lab

# 2. 验证是否全部杀死(无输出则成功)
ps aux | grep -E "python|ipykernel|jupyter" | grep -v grep

2. 验证是否全部杀死(无输出则成功)

ps aux | grep -E "python|ipykernel|jupyter" | grep -v grep

卸载 NVIDIA 内核模块(关键)

剩余的系统 Python 进程不占用 GPU,可直接卸载模块:

bash 复制代码
# 按顺序卸载驱动内核模块
sudo rmmod nvidia_uvm
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia

# 验证模块是否卸载成功(无输出则成功)
lsmod | grep nvidia

(base) root@ubuntu:/data/fastgpt#

sudo rmmod nvidia_drm rmmod: ERROR: Module nvidia_drm is in use

!!!此处必须重启服务器

reboot

然后再运行下面,会输出没有被loaded,后面就可以正常安装cuda12了

bash 复制代码
# 按顺序卸载驱动内核模块
sudo rmmod nvidia_uvm
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia

# 验证模块是否卸载成功(无输出则成功)
lsmod | grep nvidia

安装cuda12.2

bash 复制代码
# 切换到安装包目录
cd /data/ghf

# 强制安装CUDA 12.2+驱动(跳过所有检查)
sudo ./cuda_12.2.0_535.54.03_linux.run --silent --driver --toolkit 
# 安装完成后必须重启(核心)
sudo reboot

成功!!!!

相关推荐
科技小花13 小时前
全球化深水区,数据治理成为企业出海 “核心竞争力”
大数据·数据库·人工智能·数据治理·数据中台·全球化
zhuiyisuifeng14 小时前
2026前瞻:GPTimage2镜像官网或将颠覆视觉创作
人工智能·gpt
徐健峰14 小时前
GPT-image-2 热门玩法实战(一):AI 看手相 — 一张手掌照片生成专业手相分析图
人工智能·gpt
weixin_3709763514 小时前
AI的终极赛跑:进入AGI,还是泡沫破灭?
大数据·人工智能·agi
Slow菜鸟14 小时前
AI学习篇(五) | awesome-design-md 使用说明
人工智能·学习
冬奇Lab15 小时前
RAG 系列(五):Embedding 模型——语义理解的核心
人工智能·llm·aigc
深小乐15 小时前
AI 周刊【2026.04.27-05.03】:Anthropic 9000亿美元估值、英伟达死磕智能体、中央重磅定调AI
人工智能
码点滴15 小时前
什么时候用 DeepSeek V4,而不是 GPT-5/Claude/Gemini?
人工智能·gpt·架构·大模型·deepseek
狐狐生风15 小时前
LangChain 向量存储:Chroma、FAISS
人工智能·python·学习·langchain·faiss·agentai
波动几何15 小时前
CDA架构代码工坊技能cda-code-lab
人工智能