jetson设备上自己手动编译适配版本的pytorch全过程,及报错的解决方法

在jetson设备上不可以直接用pip install torch来安装pytorch,必须要去nvidia官网下载编译好的whl文件进行pip install,这有更直接一点的地址来安装jetpack对应的whl,很可惜最高只支持jetpack6.1,这意味着超过6.1版本的必须自己从pytorch官网进行来下载源码编译,那么就开始正题。

首先获取源码:

bash 复制代码
# 1. 克隆源码(带子模块)
git clone --recursive https://github.com/pytorch/pytorch.git

如果没有VPN会非常久。

然后查看官网的兼容表,比如我6.2选择2.7.0a0+7c8ec84dab这个版本的torch:

bash 复制代码
cd pytorch
# 2. 切到一个稳定版本(建议 v2.3.0 或 v2.4.0)
git checkout 7c8ec84dab
git submodule sync
git submodule update --init --recursive --jobs 0

然后安装系统依赖(apt):

bash 复制代码
sudo apt update
sudo apt install -y \
    build-essential \
    cmake \
    ninja-build \
    git \
    libopenblas-dev \
    libjpeg-dev \
    libpng-dev \
    python3-dev \
    python3-pip \
    python3-setuptools \
    python3-wheel \
    libprotobuf-dev \
    protobuf-compiler \
    zlib1g-dev

安装Python 依赖(pip):

bash 复制代码
pip install --upgrade pip
pip install -r requirements.txt
pip install typing_extensions sympy filelock networkx jinja2

清理旧版本torch:

bash 复制代码
pip uninstall -y torch torchvision torchaudio

设置环境变量:

bash 复制代码
export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu:$CUDA_HOME/lib64:$LD_LIBRARY_PATH

export USE_CUDA=1
export USE_CUDNN=1
export USE_MKLDNN=0
export USE_NCCL=0

export TORCH_CUDA_ARCH_LIST="8.7"

开始编译:

bash 复制代码
python3 setup.py bdist_wheel

不出意外报错:

bash 复制代码
CMake Error at third_party/protobuf/cmake/CMakeLists.txt:2 (cmake_minimum_required):
  Compatibility with CMake < 3.5 has been removed from CMake.

  Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
  to tell CMake that the project requires at least <min> but has been updated
  to work with policies introduced by <max> or earlier.

  Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway.


-- Configuring incomplete, errors occurred!

cmake版本太低,安装新版本 CMake:

bash 复制代码
pip install --upgrade cmake
# 查看版本 结果应该会是cmake version 3.28.x
cmake --version

然后先清理再重新编译:

bash 复制代码
python3 setup.py clean
rm -rf build
python3 setup.py bdist_wheel

建议可以先看一下内存:

bash 复制代码
free -h

如果小于8G则需要扩容否则会OOM,编译过程可能4-6小时,非常吃RAM:

bash 复制代码
sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

重新编译后可能还是报错:

bash 复制代码
-- Found Python: /usr/bin/python3 (found version "3.10.12") found components: Interpreter
CMake Warning at cmake/public/cuda.cmake:140 (message):
  Failed to compute shorthash for libnvrtc.so
Call Stack (most recent call first):
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:865 (include)


-- Found nvtx3: /home/baiyi/pytorch/third_party/NVTX/c/include
-- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH) 
CMake Warning at cmake/public/cuda.cmake:218 (message):
  Cannot find cuDNN library.  Turning the option off
Call Stack (most recent call first):
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:865 (include)


-- Could NOT find CUSPARSELT (missing: CUSPARSELT_LIBRARY_PATH CUSPARSELT_INCLUDE_PATH) 
CMake Warning at cmake/public/cuda.cmake:243 (message):
  Cannot find cuSPARSELt library.  Turning the option off
Call Stack (most recent call first):
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:865 (include)


-- Could NOT find CUDSS (missing: CUDSS_LIBRARY_PATH CUDSS_INCLUDE_PATH) 
CMake Warning at cmake/public/cuda.cmake:259 (message):
  Cannot find CUDSS library.  Turning the option off
Call Stack (most recent call first):
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:865 (include)


-- USE_CUFILE is set to 0. Compiling without cuFile support
-- Added CUDA NVCC flags for: -gencode;arch=compute_87,code=sm_87
CMake Warning at cmake/Dependencies.cmake:95 (message):
  Not compiling with XPU.  Could NOT find SYCL.Suppress this warning with
  -DUSE_XPU=OFF.
Call Stack (most recent call first):
  CMakeLists.txt:865 (include)


-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
CMake Error at third_party/protobuf/cmake/CMakeLists.txt:2 (cmake_minimum_required):
  Compatibility with CMake < 3.5 has been removed from CMake.

  Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
  to tell CMake that the project requires at least <min> but has been updated
  to work with policies introduced by <max> or earlier.

  Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway.


-- Configuring incomplete, errors occurred!

升级了 pip 的 cmake,但 PyTorch 编译时仍然在调用系统的旧 CMake(3.22)

因为这个cmake的调用折腾了非常非常非常久,省略中间过程,直接贴出最终的解决方案:

删除所有 CMake 可执行文件
bash 复制代码
sudo rm -f /usr/local/bin/cmake
sudo rm -f /usr/local/bin/ctest
sudo rm -f /usr/local/bin/cpack
sudo rm -f /usr/local/bin/cmake-gui
sudo rm -f /usr/local/bin/cmake3
删除 pip CMake 的所有残留
bash 复制代码
pip uninstall -y cmake
pip uninstall -y cmake
pip uninstall -y cmake

然后删除目录:

bash 复制代码
rm -rf ~/.local/lib/python3.10/site-packages/cmake
rm -rf ~/.local/share/cmake*
rm -rf ~/.local/bin/cmake
rm -rf ~/.local/bin/ctest
rm -rf ~/.local/bin/cpack
删除所有 CMake 环境变量
bash 复制代码
unset CMAKE_ROOT
unset CMAKE_COMMAND
unset CMAKE_PREFIX_PATH
unset CMAKE_MODULE_PATH
清理 PyTorch 构建缓存
bash 复制代码
cd ~/pytorch  # 如果一直在pytorch目录里面则不用执行
python3 setup.py clean
rm -rf build
rm -rf CMakeCache.txt
rm -rf CMakeFiles
重新安装干净的 CMake 3.28.3
bash 复制代码
pip install cmake==3.28.3
which cmake
cmake --version

结果必须是这样的:

bash 复制代码
/home/xxx/.local/bin/cmake
cmake version 3.28.3
重新构建 PyTorch
bash 复制代码
python3 setup.py bdist_wheel

如果看到了以下log则说明真正开始进行编译:

bash 复制代码
-- The C compiler identification is GNU 11.x
-- The CXX compiler identification is GNU 11.x
-- Found CUDA: /usr/local/cuda (found version "12.6")
-- Found cuDNN
-- Configuring done
-- Generating done

这个过程将非常之久,4-6小时都很正常,不要动到电脑且保证不断电,不然需要从头开始。

不出意外,几个小时后有了一个报错:

bash 复制代码
FAILED: [code=137] caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim128_bf16_sm80.cu.o
... ...
... ...
Killed
[2958/6210] Building CUDA object caffe2/CMakeFiles/torch_cu...rs/cuda/flash_attn/kernels/flash_bwd_hdim256_bf16_sm80.cu.o
ninja: build stopped: subcommand failed.

这是典型的 OOM(内存不足)导致的编译被系统杀掉(SIGKILL 137),Jetson AGX Orin 在编译 PyTorch 的 CUDA 内核时内存不够,被 Linux OOM Killer 杀掉了。

我在编译之前看了swap是14G以为够了,没想到还是OOM了,那只能继续扩大了,扩到24G。

先关闭旧 swap:

bash 复制代码
sudo swapoff -a

创建 24GB swap:

bash 复制代码
sudo fallocate -l 24G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

创建完用free -h可以查看swap大小 。

清理构建缓存(避免重复触发 OOM):

bash 复制代码
python3 setup.py clean
rm -rf build
rm -rf CMakeCache.txt

限制 ninja 并行度后重新编译:

bash 复制代码
export MAX_JOBS=4
python3 setup.py bdist_wheel

当最后几行输出如下时,代表编译成功了:

bash 复制代码
... ...
... ...
adding 'torchgen/static_runtime/config.py'
adding 'torchgen/static_runtime/gen_static_runtime_ops.py'
adding 'torchgen/static_runtime/generator.py'
adding 'torch-2.7.0a0+git7c8ec84.dist-info/METADATA'
adding 'torch-2.7.0a0+git7c8ec84.dist-info/WHEEL'
adding 'torch-2.7.0a0+git7c8ec84.dist-info/entry_points.txt'
adding 'torch-2.7.0a0+git7c8ec84.dist-info/top_level.txt'
adding 'torch-2.7.0a0+git7c8ec84.dist-info/RECORD'
removing build/bdist.linux-aarch64/wheel

dist/目录底下则可以看到生成的whl文件,使用pip install安装即可。

安装完可以验证一下:

python 复制代码
python3 - << 'EOF'
import torch
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)
print("cuDNN version:", torch.backends.cudnn.version())
print("GPU:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "None")
EOF

接下来开始构建继续构建torchvision:

bash 复制代码
git clone https://github.com/pytorch/vision.git  # 不在pytorch目录下,pytorch之外
cd vision

设置环境变量(与 PyTorch 一致):

bash 复制代码
export CUDA_HOME=/usr/local/cuda
export TORCH_CUDA_ARCH_LIST="8.7"
export FORCE_CUDA=1

安装依赖(torchvision 需要 pillow、numpy 等):

bash 复制代码
pip install numpy pillow scipy

开始构建 torchvision:

bash 复制代码
python3 setup.py bdist_wheel

可能会报错:

bash 复制代码
In file included from /home/baiyi/vision/torchvision/csrc/io/image/cpu/decode_gif.cpp:3:
/home/baiyi/vision/torchvision/csrc/io/image/cpu/../common.h:4:10: fatal error: torch/headeronly/util/Exception.h: No such file or directory
    4 | #include <torch/headeronly/util/Exception.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
... ...
... ...
RuntimeError: Error compiling objects for extension

是因为torchvision 版本太新,而编译的 PyTorch 是 nightly(2.7.0a0),两者 API 不匹配,PyTorch nightly 的 C++ API 还没同步到 torchvision main 的最新改动。

切换到兼容 PyTorch 2.7 nightly 的版本:

bash 复制代码
git fetch --all
git checkout release/0.20

清理旧构建:

bash 复制代码
python3 setup.py clean
rm -rf build

重新构建:

bash 复制代码
python3 setup.py bdist_wheel

构建完之后还是去dist/目录底下使用pip install安装即可。

最后验证 torchvision 是否成功启用 CUDA:

python 复制代码
python3 - << 'EOF'
import torchvision
import torch
print("torchvision:", torchvision.__version__)
print("torch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
EOF
相关推荐
HelloTonyGo5 小时前
个人游戏笔记本免费“养龙虾”(二)用显卡GPU运行OpenClaw,CUDA的安装与配置
gpu·nvidia·cuda·openclaw
Chen三变5 小时前
Pytorch和Tensorflow两大架构如何安装?想在自己的电脑上跑神经网络?如何找到部署自己电脑版本的神经网络工具?人工智能专业的学生集合!!
人工智能·pytorch·tensorflow
盼小辉丶6 小时前
PyTorch实战(37)——使用Optuna搜索最优超参数
人工智能·pytorch·深度学习·自动机器学习
Mr.Winter`7 小时前
深度强化学习 | 基于Double DQN算法的移动机器人路径规划(附Pytorch实现)
人工智能·pytorch·深度学习·神经网络·机器人·自动驾驶·具身智能
吴佳浩 Alben18 小时前
GPU 生产环境实践:硬件拓扑、显存管理与完整运维体系
运维·人工智能·pytorch·语言模型·transformer·vllm
吴佳浩 Alben1 天前
GPU 编号错乱踩坑指南:PyTorch cuda 编号与 nvidia-smi 不一致
人工智能·pytorch·python·深度学习·神经网络·语言模型·自然语言处理
吴佳浩 Alben1 天前
CUDA_VISIBLE_DEVICES、多进程与容器化陷阱
人工智能·pytorch·语言模型·transformer
koo3641 天前
pytorch深度学习笔记23
pytorch·笔记·深度学习
剑穗挂着新流苏3121 天前
109_神经网络的决策层:线性层(Linear Layer)与数据展平详解
人工智能·pytorch·深度学习