jetson设备上自己手动编译适配版本的pytorch全过程，及报错的解决方法

在jetson设备上不可以直接用pip install torch来安装pytorch，必须要去nvidia官网下载编译好的whl文件进行pip install，这有更直接一点的地址来安装jetpack对应的whl，很可惜最高只支持jetpack6.1，这意味着超过6.1版本的必须自己从pytorch官网进行来下载源码编译，那么就开始正题。

首先获取源码：

bash 复制代码

# 1. 克隆源码（带子模块）
git clone --recursive https://github.com/pytorch/pytorch.git

如果没有VPN会非常久。

然后查看官网的兼容表，比如我6.2选择2.7.0a0+7c8ec84dab这个版本的torch：

bash 复制代码

cd pytorch
# 2. 切到一个稳定版本（建议 v2.3.0 或 v2.4.0）
git checkout 7c8ec84dab
git submodule sync
git submodule update --init --recursive --jobs 0

然后安装系统依赖（apt）：

bash 复制代码

sudo apt update
sudo apt install -y \
    build-essential \
    cmake \
    ninja-build \
    git \
    libopenblas-dev \
    libjpeg-dev \
    libpng-dev \
    python3-dev \
    python3-pip \
    python3-setuptools \
    python3-wheel \
    libprotobuf-dev \
    protobuf-compiler \
    zlib1g-dev

安装Python 依赖（pip）：

bash 复制代码

pip install --upgrade pip
pip install -r requirements.txt
pip install typing_extensions sympy filelock networkx jinja2

清理旧版本torch：

bash 复制代码

pip uninstall -y torch torchvision torchaudio

设置环境变量：

bash 复制代码

export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu:$CUDA_HOME/lib64:$LD_LIBRARY_PATH

export USE_CUDA=1
export USE_CUDNN=1
export USE_MKLDNN=0
export USE_NCCL=0

export TORCH_CUDA_ARCH_LIST="8.7"

开始编译：

bash 复制代码

python3 setup.py bdist_wheel

不出意外报错：

bash 复制代码

CMake Error at third_party/protobuf/cmake/CMakeLists.txt:2 (cmake_minimum_required):
  Compatibility with CMake < 3.5 has been removed from CMake.

  Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
  to tell CMake that the project requires at least <min> but has been updated
  to work with policies introduced by <max> or earlier.

  Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway.


-- Configuring incomplete, errors occurred!

cmake版本太低，安装新版本 CMake：

bash 复制代码

pip install --upgrade cmake
# 查看版本 结果应该会是cmake version 3.28.x
cmake --version

然后先清理再重新编译：

bash 复制代码

python3 setup.py clean
rm -rf build
python3 setup.py bdist_wheel

建议可以先看一下内存：

bash 复制代码

free -h

如果小于8G则需要扩容否则会OOM，编译过程可能4-6小时，非常吃RAM：

bash 复制代码

sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

重新编译后可能还是报错：

bash 复制代码

-- Found Python: /usr/bin/python3 (found version "3.10.12") found components: Interpreter
CMake Warning at cmake/public/cuda.cmake:140 (message):
  Failed to compute shorthash for libnvrtc.so
Call Stack (most recent call first):
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:865 (include)


-- Found nvtx3: /home/baiyi/pytorch/third_party/NVTX/c/include
-- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH) 
CMake Warning at cmake/public/cuda.cmake:218 (message):
  Cannot find cuDNN library.  Turning the option off
Call Stack (most recent call first):
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:865 (include)


-- Could NOT find CUSPARSELT (missing: CUSPARSELT_LIBRARY_PATH CUSPARSELT_INCLUDE_PATH) 
CMake Warning at cmake/public/cuda.cmake:243 (message):
  Cannot find cuSPARSELt library.  Turning the option off
Call Stack (most recent call first):
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:865 (include)


-- Could NOT find CUDSS (missing: CUDSS_LIBRARY_PATH CUDSS_INCLUDE_PATH) 
CMake Warning at cmake/public/cuda.cmake:259 (message):
  Cannot find CUDSS library.  Turning the option off
Call Stack (most recent call first):
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:865 (include)


-- USE_CUFILE is set to 0. Compiling without cuFile support
-- Added CUDA NVCC flags for: -gencode;arch=compute_87,code=sm_87
CMake Warning at cmake/Dependencies.cmake:95 (message):
  Not compiling with XPU.  Could NOT find SYCL.Suppress this warning with
  -DUSE_XPU=OFF.
Call Stack (most recent call first):
  CMakeLists.txt:865 (include)


-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
CMake Error at third_party/protobuf/cmake/CMakeLists.txt:2 (cmake_minimum_required):
  Compatibility with CMake < 3.5 has been removed from CMake.

  Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
  to tell CMake that the project requires at least <min> but has been updated
  to work with policies introduced by <max> or earlier.

  Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway.


-- Configuring incomplete, errors occurred!

升级了 pip 的 cmake，但 PyTorch 编译时仍然在调用系统的旧 CMake（3.22）

因为这个cmake的调用折腾了非常非常非常久，省略中间过程，直接贴出最终的解决方案：

删除所有 CMake 可执行文件

bash 复制代码

sudo rm -f /usr/local/bin/cmake
sudo rm -f /usr/local/bin/ctest
sudo rm -f /usr/local/bin/cpack
sudo rm -f /usr/local/bin/cmake-gui
sudo rm -f /usr/local/bin/cmake3

删除 pip CMake 的所有残留

bash 复制代码

pip uninstall -y cmake
pip uninstall -y cmake
pip uninstall -y cmake

然后删除目录：

bash 复制代码

rm -rf ~/.local/lib/python3.10/site-packages/cmake
rm -rf ~/.local/share/cmake*
rm -rf ~/.local/bin/cmake
rm -rf ~/.local/bin/ctest
rm -rf ~/.local/bin/cpack

删除所有 CMake 环境变量

bash 复制代码

unset CMAKE_ROOT
unset CMAKE_COMMAND
unset CMAKE_PREFIX_PATH
unset CMAKE_MODULE_PATH

清理 PyTorch 构建缓存

bash 复制代码

cd ~/pytorch  # 如果一直在pytorch目录里面则不用执行
python3 setup.py clean
rm -rf build
rm -rf CMakeCache.txt
rm -rf CMakeFiles

重新安装干净的 CMake 3.28.3

bash 复制代码

pip install cmake==3.28.3
which cmake
cmake --version

结果必须是这样的：

bash 复制代码

/home/xxx/.local/bin/cmake
cmake version 3.28.3

重新构建 PyTorch

bash 复制代码

python3 setup.py bdist_wheel

如果看到了以下log则说明真正开始进行编译：

bash 复制代码

-- The C compiler identification is GNU 11.x
-- The CXX compiler identification is GNU 11.x
-- Found CUDA: /usr/local/cuda (found version "12.6")
-- Found cuDNN
-- Configuring done
-- Generating done

这个过程将非常之久，4-6小时都很正常，不要动到电脑且保证不断电，不然需要从头开始。

不出意外，几个小时后有了一个报错：

bash 复制代码

FAILED: [code=137] caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim128_bf16_sm80.cu.o
... ...
... ...
Killed
[2958/6210] Building CUDA object caffe2/CMakeFiles/torch_cu...rs/cuda/flash_attn/kernels/flash_bwd_hdim256_bf16_sm80.cu.o
ninja: build stopped: subcommand failed.

这是典型的 OOM（内存不足）导致的编译被系统杀掉（SIGKILL 137），Jetson AGX Orin 在编译 PyTorch 的 CUDA 内核时内存不够，被 Linux OOM Killer 杀掉了。

我在编译之前看了swap是14G以为够了，没想到还是OOM了，那只能继续扩大了，扩到24G。

先关闭旧 swap：

bash 复制代码

sudo swapoff -a

创建 24GB swap：

bash 复制代码

sudo fallocate -l 24G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

创建完用free -h可以查看swap大小。

清理构建缓存（避免重复触发 OOM）：

bash 复制代码

python3 setup.py clean
rm -rf build
rm -rf CMakeCache.txt

限制 ninja 并行度后重新编译：

bash 复制代码

export MAX_JOBS=4
python3 setup.py bdist_wheel

当最后几行输出如下时，代表编译成功了：

bash 复制代码

... ...
... ...
adding 'torchgen/static_runtime/config.py'
adding 'torchgen/static_runtime/gen_static_runtime_ops.py'
adding 'torchgen/static_runtime/generator.py'
adding 'torch-2.7.0a0+git7c8ec84.dist-info/METADATA'
adding 'torch-2.7.0a0+git7c8ec84.dist-info/WHEEL'
adding 'torch-2.7.0a0+git7c8ec84.dist-info/entry_points.txt'
adding 'torch-2.7.0a0+git7c8ec84.dist-info/top_level.txt'
adding 'torch-2.7.0a0+git7c8ec84.dist-info/RECORD'
removing build/bdist.linux-aarch64/wheel

在dist/目录底下则可以看到生成的whl文件，使用pip install安装即可。

安装完可以验证一下：

python 复制代码

python3 - << 'EOF'
import torch
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)
print("cuDNN version:", torch.backends.cudnn.version())
print("GPU:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "None")
EOF

接下来开始构建继续构建torchvision：

bash 复制代码

git clone https://github.com/pytorch/vision.git  # 不在pytorch目录下，pytorch之外
cd vision

设置环境变量（与 PyTorch 一致）：

bash 复制代码

export CUDA_HOME=/usr/local/cuda
export TORCH_CUDA_ARCH_LIST="8.7"
export FORCE_CUDA=1

安装依赖（torchvision 需要 pillow、numpy 等）：

bash 复制代码

pip install numpy pillow scipy

开始构建 torchvision：

bash 复制代码

python3 setup.py bdist_wheel

可能会报错：

bash 复制代码

In file included from /home/baiyi/vision/torchvision/csrc/io/image/cpu/decode_gif.cpp:3:
/home/baiyi/vision/torchvision/csrc/io/image/cpu/../common.h:4:10: fatal error: torch/headeronly/util/Exception.h: No such file or directory
    4 | #include <torch/headeronly/util/Exception.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
... ...
... ...
RuntimeError: Error compiling objects for extension

是因为torchvision 版本太新，而编译的 PyTorch 是 nightly（2.7.0a0），两者 API 不匹配，PyTorch nightly 的 C++ API 还没同步到 torchvision main 的最新改动。

切换到兼容 PyTorch 2.7 nightly 的版本：

bash 复制代码

git fetch --all
git checkout release/0.20

清理旧构建：

bash 复制代码

python3 setup.py clean
rm -rf build

重新构建：

bash 复制代码

python3 setup.py bdist_wheel

构建完之后还是去dist/目录底下使用pip install安装即可。

最后验证 torchvision 是否成功启用 CUDA：

python 复制代码

python3 - << 'EOF'
import torchvision
import torch
print("torchvision:", torchvision.__version__)
print("torch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
EOF