MLFF 环境安装教程：MACE / Egret / AIMNet2 / SO3LR

本教程记录了在 /public/software/MLFF 下为 RTX 4090 (cn01) 和 RTX 5090 (cn10) 两类计算节点分别安装 MACE、Egret、AIMNet2、SO3LR 四种机器学习力场工具的完整过程。

每个工具独立一个 conda 环境，每个 GPU 型号各一套，共计 8 个环境。

一、环境概览

环境	Python	核心框架	CUDA 版本	路径
4090-MACE	3.11	PyTorch 2.6.0+cu124	12.4	`envs/4090/mace`
4090-Egret	3.11	PyTorch 2.6.0+cu124	12.4	`envs/4090/egret`
4090-AIMNet2	3.11	PyTorch 2.6.0+cu124	12.4	`envs/4090/aimnet2`
4090-SO3LR	3.12	JAX 0.5.3	12	`envs/4090/so3lr`
5090-MACE	3.11	PyTorch 2.11.0+cu128	12.8	`envs/5090/mace`
5090-Egret	3.11	PyTorch 2.11.0+cu128	12.8	`envs/5090/egret`
5090-AIMNet2	3.11	PyTorch 2.11.0+cu128	12.8	`envs/5090/aimnet2`
5090-SO3LR	3.12	JAX 0.5.3	12	`envs/5090/so3lr`

注意：

SO3LR 使用 Python 3.12，因为其依赖要求 python >= 3.12。

AIMNet2 在 4090（PyTorch 2.6）上无法正常使用，仅能在 5090（PyTorch 2.11）上运行。详见踩坑记录 8.8。

兼容性矩阵

工具	cn01 (RTX 4090)	cn10 (RTX 5090)	说明
MACE	✅	✅	两种 CUDA 均可运行
Egret	✅	✅	两种 CUDA 均可运行
AIMNet2	❌	✅	需要 PyTorch ≥ 2.8（`torch.library.custom_op` 的 `list[torch.Tensor]` 语法）
SO3LR	✅	✅	JAX 0.5.3 两种 GPU 均可运行

二、需提前下载的包

由于 GitHub 网络不稳定，以下包强烈建议提前下载 放在 package/ 目录下。

2.1 模型文件

MACE 模型（5个）

文件	大小	下载地址
MACE-OFF23_small.model	7.0 MB	https://github.com/ACEsuit/mace-off/tree/main/mace_off23
MACE-OFF23_medium.model	17.5 MB	同上
MACE-OFF23b_medium.model	17.5 MB	同上
MACE-OFF23_large.model	52.9 MB	同上
MACE-OFF24_medium.model	17.5 MB	https://github.com/ACEsuit/mace-off/tree/main/mace_off24

Egret 模型（5个）

文件	大小	下载地址
EGRET_1.model	22.3 MB	https://github.com/rowansci/egret-public/tree/master/compiled_models
EGRET_1E.model	23.2 MB	同上
EGRET_1M.model	9.0 MB	同上
EGRET_1S.model	3.7 MB	同上
EGRET_1T.model	22.3 MB	同上

AIMNet2 模型（5个）

目录	下载地址
aimnet2-wb97m-d3	https://github.com/kabylda/aimnet2
aimnet2-2025	同上
aimnet2-pd	同上
aimnet2-rxn	同上
aimnet2-nse	同上

源码包

文件	下载地址	说明
Egret 源码包	https://github.com/rowansci/egret-public/archive/refs/heads/master.zip	含 compiled_models 目录
MACE-OFF 源码包	https://github.com/ACEsuit/mace-off/archive/refs/heads/main.zip	含模型文件
MACE 源码包	https://github.com/ACEsuit/mace/archive/refs/heads/main.zip	备用

2.2 SO3LR 及其依赖（最关键！全部需要 git clone 或下载 zip）

SO3LR 本身和 5 个子依赖 都需要从 GitHub 获取，且网络极不稳定，必须提前准备。

仓库	下载地址	分支/标签	说明
so3lr	https://github.com/general-molecular-simulations/so3lr/archive/refs/heads/main.zip	main	SO3LR 主包
mlff	https://github.com/kabylda/mlff （需 git clone）	v1.0_dev_import	so3lr 的依赖
jraph	https://github.com/kabylda/jraph （需 git clone）	master	so3lr 的依赖
jax-md	https://github.com/kabylda/jax-md/archive/refs/heads/main.zip	main	so3lr 的依赖
glp	https://github.com/kabylda/glp （需 git clone）	electrostatics_neighbourlist	so3lr 的依赖
comms	https://github.com/sirmarcel/comms （需 git clone）	main	glp 的子依赖
jax-pme	https://github.com/fbruenig/jax-pme （需 git clone）	main	so3lr 的依赖

重要提醒：其中 mlff、jraph、glp、comms 没有提供 zip 下载，只能 git clone。如果网络不通，可在有网络的机器上 clone 后打包拷贝过来。

三、目录结构

复制代码

/public/software/MLFF/
├── envs/                     # conda 环境（8个）
│   ├── 4090/
│   │   ├── mace/             # PyTorch 2.6.0+cu124, python 3.11
│   │   ├── egret/            # PyTorch 2.6.0+cu124, python 3.11
│   │   ├── aimnet2/          # PyTorch 2.6.0+cu124, python 3.11 (⚠️ 无法运行)
│   │   └── so3lr/            # JAX 0.5.3, python 3.12
│   └── 5090/
│       ├── mace/             # PyTorch 2.11.0+cu128, python 3.11
│       ├── egret/            # PyTorch 2.11.0+cu128, python 3.11
│       ├── aimnet2/          # PyTorch 2.11.0+cu128, python 3.11
│       └── so3lr/            # JAX 0.5.3, python 3.12
├── models/                   # 预训练模型文件
│   ├── mace/                 # 5 个 MACE-OFF 模型
│   ├── egret/                # 5 个 EGRET 模型
│   ├── aimnet2/              # 5 个 AIMNet2 模型（子目录形式）
│   └── so3lr/                # SO3LR 模型（如有）
├── example/                  # 测试脚本（详见第七章）
│   ├── mace/                 # test_mace.py, test_mace_all.py
│   ├── egret/                # test_egret.py, test_egret_all.py
│   ├── aimnet2/              # test_aimnet2.py, test_aimnet2_all.py
│   ├── so3lr/                # test_so3lr.py
│   └── run_all_tests.sh      # 一键测试脚本
├── so3lr-main/               # SO3LR 源码（editable install，勿删！）
├── so3lr-deps/               # SO3LR 子依赖源码（editable install，勿删！）
│   ├── mlff/
│   ├── jraph/
│   ├── jax-md-main/
│   ├── glp/
│   ├── comms/
│   └── jax-pme/
└── package/                  # 安装包/源码压缩包（可移走）

四、安装步骤

4.0 初始化

bash 复制代码

# 创建目录结构
mkdir -p /public/software/MLFF/{envs/4090,envs/5090,models/{mace,egret,aimnet2,so3lr},example/{mace,egret,aimnet2,so3lr},logs}

# 初始化 mamba
source /public/software/mamba/etc/profile.d/conda.sh

4.1 通用 conda 包列表

每个环境都装这些基础包（MACE/Egret/AIMNet2 用 python=3.11，SO3LR 用 python=3.12）：

bash 复制代码

COMMON_PKGS="python=3.11 pip numpy scipy pandas h5py matplotlib seaborn tqdm pyyaml networkx requests ipython jupyterlab ase mdanalysis mdtraj rdkit parmed pdbfixer openff-toolkit openmm"

踩坑记录 ：ambertools 和 gromacs 通过 conda-forge 安装会与 openff-toolkit 产生 icu/qt6/libboost 依赖冲突，无法同时安装。系统已有 /public/software/amber24 和 /public/software/gromacs-2026-beta，通过 modulefile 引用即可。

4.2 安装 4090-MACE

bash 复制代码

source /public/software/mamba/etc/profile.d/conda.sh

# Step 1: 创建 conda 环境
mamba create -y -p /public/software/MLFF/envs/4090/mace -c conda-forge \
    python=3.11 pip numpy scipy pandas h5py matplotlib seaborn tqdm pyyaml \
    networkx requests ipython jupyterlab ase mdanalysis mdtraj rdkit parmed \
    pdbfixer openff-toolkit openmm

# Step 2: 安装 PyTorch CUDA 12.4 版本
# ⚠️ 关键坑：必须用 --force-reinstall 确保安装 CUDA 版本，否则 pip 可能装成 CPU 版本！
conda activate /public/software/MLFF/envs/4090/mace
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
    --index-url https://download.pytorch.org/whl/cu124 --force-reinstall

# Step 3: 安装 mace-torch 及 equivariance 库
pip install mace-torch cuequivariance-torch cuequivariance-ops-torch-cu12

踩坑记录 ：如果不加 --force-reinstall，pip 可能因为已有 conda-forge 的 numpy 等包而选择安装 CPU 版本的 PyTorch（torch.version.cuda = None）。在 GPU 节点上 torch.cuda.is_available() 会返回 False。必须强制重装 CUDA 版本。

4.3 安装 4090-Egret

与 MACE 完全相同的步骤，只是环境路径不同：

bash 复制代码

source /public/software/mamba/etc/profile.d/conda.sh

mamba create -y -p /public/software/MLFF/envs/4090/egret -c conda-forge \
    python=3.11 pip numpy scipy pandas h5py matplotlib seaborn tqdm pyyaml \
    networkx requests ipython jupyterlab ase mdanalysis mdtraj rdkit parmed \
    pdbfixer openff-toolkit openmm

conda activate /public/software/MLFF/envs/4090/egret
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
    --index-url https://download.pytorch.org/whl/cu124 --force-reinstall
pip install mace-torch cuequivariance-torch cuequivariance-ops-torch-cu12

Egret 模型下载（如果网络可用）：

bash 复制代码

cd /public/software/MLFF/models/egret
wget https://github.com/rowansci/egret-public/raw/master/compiled_models/EGRET_1.model
wget https://github.com/rowansci/egret-public/raw/master/compiled_models/EGRET_1E.model
wget https://github.com/rowansci/egret-public/raw/master/compiled_models/EGRET_1M.model
wget https://github.com/rowansci/egret-public/raw/master/compiled_models/EGRET_1S.model
wget https://github.com/rowansci/egret-public/raw/master/compiled_models/EGRET_1T.model

或者从已解压的源码包复制：

bash 复制代码

cp /public/software/MLFF/egret-public-master/compiled_models/*.model \
    /public/software/MLFF/models/egret/

4.4 安装 4090-AIMNet2

bash 复制代码

source /public/software/mamba/etc/profile.d/conda.sh

mamba create -y -p /public/software/MLFF/envs/4090/aimnet2 -c conda-forge \
    python=3.11 pip numpy scipy pandas h5py matplotlib seaborn tqdm pyyaml \
    networkx requests ipython jupyterlab ase mdanalysis mdtraj rdkit parmed \
    pdbfixer openff-toolkit openmm

conda activate /public/software/MLFF/envs/4090/aimnet2
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
    --index-url https://download.pytorch.org/whl/cu124 --force-reinstall
pip install "aimnet[ase]" huggingface_hub safetensors

⚠️ 严重限制 ：AIMNet2 使用了 torch.library.custom_op 的 list[torch.Tensor] 返回类型语法，这个特性在 PyTorch 2.8+ 才引入。在 PyTorch 2.6（4090/cu124 最高版本）下，import aimnet 会直接报 ValueError。AIMNet2 只能在 5090（PyTorch 2.11+cu128）上使用。 4090 环境虽然创建了，但实际无法运行 AIMNet2 计算。

4.5 安装 4090-SO3LR（最复杂！）

SO3LR 需要 Python 3.12 和 JAX，且有多个 git 依赖。

Step 1: 准备 SO3LR 源码和依赖

bash 复制代码

# 解压 so3lr 源码（如果已有 zip 包）
cd /public/software/MLFF
unzip package/so3lr-main.zip

# 准备子依赖目录
mkdir -p so3lr-deps
cd so3lr-deps

# 以下需要网络或提前准备好：
git clone --branch v1.0_dev_import --depth 1 https://github.com/kabylda/mlff.git
git clone --branch master --depth 1 https://github.com/kabylda/jraph.git
unzip ../package/jax-md-main.zip    # 或 git clone --branch main --depth 1 https://github.com/kabylda/jax-md.git
git clone --branch electrostatics_neighbourlist --depth 1 https://github.com/kabylda/glp.git
git clone --branch main --depth 1 https://github.com/sirmarcel/comms.git  # glp 的子依赖！
git clone --branch main --depth 1 https://github.com/fbruenig/jax-pme.git

Step 2: 修改 git 依赖为本地路径（关键！）

踩坑记录 ：SO3LR 的 pyproject.toml 里所有子依赖都写成 git = "https://..." 格式。pip install -e . 时会尝试 git clone 这些仓库，在 GitHub 网络不通时会失败。必须修改为本地路径。

修改 /public/software/MLFF/so3lr-main/pyproject.toml：

toml 复制代码

# 将以下内容替换
mlff = { git = "https://github.com/kabylda/mlff.git", branch = "v1.0_dev_import", optional = false }
jraph = { git = "https://github.com/kabylda/jraph.git", branch = "master", optional = false }
jax-md = { git = "https://github.com/kabylda/jax-md.git", branch = "main" , optional = false }
glp = { git = "https://github.com/kabylda/glp.git", branch = "electrostatics_neighbourlist", optional = false }
jax-pme = { git = "https://github.com/fbruenig/jax-pme", branch = "main" , optional = false }

# 替换为
mlff = { path = "/public/software/MLFF/so3lr-deps/mlff", develop = true }
jraph = { path = "/public/software/MLFF/so3lr-deps/jraph", develop = true }
jax-md = { path = "/public/software/MLFF/so3lr-deps/jax-md-main", develop = true }
glp = { path = "/public/software/MLFF/so3lr-deps/glp", develop = true }
jax-pme = { path = "/public/software/MLFF/so3lr-deps/jax-pme", develop = true }

同样修改 /public/software/MLFF/so3lr-deps/mlff/pyproject.toml（mlff 也有相同的 git 依赖）：

toml 复制代码

# 替换为本地路径
jraph = { path = "/public/software/MLFF/so3lr-deps/jraph", develop = true }
jax-md = { path = "/public/software/MLFF/so3lr-deps/jax-md-main", develop = true }
glp = { path = "/public/software/MLFF/so3lr-deps/glp", develop = true }
jax-pme = { path = "/public/software/MLFF/so3lr-deps/jax-pme", develop = true }

同样修改 /public/software/MLFF/so3lr-deps/glp/pyproject.toml（glp 依赖 comms）：

toml 复制代码

# 将
comms = { git = "https://github.com/sirmarcel/comms.git", branch = "main" }
# 替换为
comms = { path = "/public/software/MLFF/so3lr-deps/comms", develop = true }

Step 3: 创建 conda 环境并安装

bash 复制代码

source /public/software/mamba/etc/profile.d/conda.sh

# ⚠️ 关键坑：mamba 有时会因为 plugin 错误而崩溃，加 CONDA_NO_PLUGINS=true
CONDA_NO_PLUGINS=true mamba create -y -p /public/software/MLFF/envs/4090/so3lr \
    -c conda-forge python=3.12 pip numpy scipy pandas h5py matplotlib seaborn \
    tqdm pyyaml networkx requests ipython jupyterlab ase mdanalysis mdtraj \
    rdkit parmed pdbfixer openff-toolkit openmm

conda activate /public/software/MLFF/envs/4090/so3lr

# 安装 JAX（先装 jax[cuda12]，so3lr 安装时会自动降级到兼容版本）
pip install -U "jax[cuda12]"

# 安装 SO3LR（从本地源码 editable install）
cd /public/software/MLFF/so3lr-main
pip install -e .

踩坑记录：

jax[cuda12] 会安装 JAX 0.10.1，但 so3lr 的依赖会自动降级到 JAX 0.5.3。这是正常的。

安装完成后，JAX cuda plugin 版本需要匹配 jaxlib 版本。so3lr 降级 jax/jaxlib 到 0.5.3 后，必须手动降级 jax-cuda12-plugin 和 jax-cuda12-pjrt：
bash 复制代码
pip install jax-cuda12-plugin==0.5.3 jax-cuda12-pjrt==0.5.3
否则在 GPU 节点上会报错：AttributeError: module 'jaxlib.xla_client' has no attribute 'register_custom_type_handler'

4.6 安装 5090 环境

5090 环境与 4090 类似，区别仅在于 PyTorch 使用 cu128 版本。

5090-MACE（完整命令）

bash 复制代码

source /public/software/mamba/etc/profile.d/conda.sh

mamba create -y -p /public/software/MLFF/envs/5090/mace -c conda-forge \
    python=3.11 pip numpy scipy pandas h5py matplotlib seaborn tqdm pyyaml \
    networkx requests ipython jupyterlab ase mdanalysis mdtraj rdkit parmed \
    pdbfixer openff-toolkit openmm

conda activate /public/software/MLFF/envs/5090/mace
pip install torch torchvision torchaudio \
    --index-url https://download.pytorch.org/whl/cu128 --force-reinstall
pip install mace-torch cuequivariance-torch cuequivariance-ops-torch-cu12

5090-Egret（完整命令）

bash 复制代码

source /public/software/mamba/etc/profile.d/conda.sh

mamba create -y -p /public/software/MLFF/envs/5090/egret -c conda-forge \
    python=3.11 pip numpy scipy pandas h5py matplotlib seaborn tqdm pyyaml \
    networkx requests ipython jupyterlab ase mdanalysis mdtraj rdkit parmed \
    pdbfixer openff-toolkit openmm

conda activate /public/software/MLFF/envs/5090/egret
pip install torch torchvision torchaudio \
    --index-url https://download.pytorch.org/whl/cu128 --force-reinstall
pip install mace-torch cuequivariance-torch cuequivariance-ops-torch-cu12

5090-AIMNet2（完整命令，⚠️ 只有 5090 能跑 AIMNet2）

bash 复制代码

source /public/software/mamba/etc/profile.d/conda.sh

mamba create -y -p /public/software/MLFF/envs/5090/aimnet2 -c conda-forge \
    python=3.11 pip numpy scipy pandas h5py matplotlib seaborn tqdm pyyaml \
    networkx requests ipython jupyterlab ase mdanalysis mdtraj rdkit parmed \
    pdbfixer openff-toolkit openmm

conda activate /public/software/MLFF/envs/5090/aimnet2
pip install torch torchvision torchaudio \
    --index-url https://download.pytorch.org/whl/cu128 --force-reinstall
pip install "aimnet[ase]" huggingface_hub safetensors

注意：safetensors 包是 AIMNet2 加载模型所必需的依赖，但 aimnet 的 pip 包未声明此依赖，需要手动安装。

5090-SO3LR（与 4090 完全相同，只是环境路径不同）

bash 复制代码

source /public/software/mamba/etc/profile.d/conda.sh

CONDA_NO_PLUGINS=true mamba create -y -p /public/software/MLFF/envs/5090/so3lr \
    -c conda-forge python=3.12 pip numpy scipy pandas h5py matplotlib seaborn \
    tqdm pyyaml networkx requests ipython jupyterlab ase mdanalysis mdtraj \
    rdkit parmed pdbfixer openff-toolkit openmm

conda activate /public/software/MLFF/envs/5090/so3lr
pip install -U "jax[cuda12]"
cd /public/software/MLFF/so3lr-main
pip install -e .
pip install jax-cuda12-plugin==0.5.3 jax-cuda12-pjrt==0.5.3

五、模型文件部署

bash 复制代码

# Egret 模型（从解压后的源码包复制）
mkdir -p /public/software/MLFF/models/egret
cp /public/software/MLFF/egret-public-master/compiled_models/*.model \
    /public/software/MLFF/models/egret/

# MACE-OFF 模型
mkdir -p /public/software/MLFF/models/mace
cp /public/software/MLFF/mace-off-main/mace_off23/*.model \
    /public/software/MLFF/models/mace/
cp /public/software/MLFF/mace-off-main/mace_off24/*.model \
    /public/software/MLFF/models/mace/

# AIMNet2 模型（从 package 目录复制）
mkdir -p /public/software/MLFF/models/aimnet2
cp -r /public/software/MLFF/package/aimnet2/* \
    /public/software/MLFF/models/aimnet2/

六、Modulefile 配置

在 /public/software/modules/modulefiles/MLFF/ 下创建 8 个 modulefile。

6.1 示例：4090-mace

tcl 复制代码

#%Module1.0
proc ModulesHelp { } {
    puts stderr "MLFF 4090 MACE environment"
}
module-whatis "MLFF 4090 MACE"

set root /public/software/MLFF/envs/4090/mace

prepend-path PATH $root/bin
prepend-path LD_LIBRARY_PATH $root/lib
prepend-path PYTHONPATH $root/lib/python3.11/site-packages

setenv CONDA_PREFIX $root
setenv MLFF_ENV 4090-mace
setenv MLFF_MODEL_DIR /public/software/MLFF/models

setenv CUDA_HOME /public/software/cuda-12.5.0
prepend-path PATH /public/software/cuda-12.5.0/bin
prepend-path LD_LIBRARY_PATH /public/software/cuda-12.5.0/lib64

setenv OPENMM_DEFAULT_PLATFORM CPU

6.2 示例：4090-so3lr（Python 3.12，加 JAX 特殊设置）

tcl 复制代码

#%Module1.0
proc ModulesHelp { } {
    puts stderr "MLFF 4090 SO3LR JAX environment"
}
module-whatis "MLFF 4090 SO3LR"

set root /public/software/MLFF/envs/4090/so3lr

prepend-path PATH $root/bin
prepend-path LD_LIBRARY_PATH $root/lib
prepend-path PYTHONPATH $root/lib/python3.12/site-packages

setenv CONDA_PREFIX $root
setenv MLFF_ENV 4090-so3lr
setenv MLFF_MODEL_DIR /public/software/MLFF/models

setenv CUDA_HOME /public/software/cuda-12.5.0
prepend-path PATH /public/software/cuda-12.5.0/bin
prepend-path LD_LIBRARY_PATH /public/software/cuda-12.5.0/lib64

setenv XLA_PYTHON_CLIENT_PREALLOCATE false
setenv OPENMM_DEFAULT_PLATFORM CPU

6.3 5090 环境的区别

5090 环境的 modulefile 中 CUDA_HOME 改为 /public/software/cuda-13.1：

tcl 复制代码

setenv CUDA_HOME /public/software/cuda-13.1
prepend-path PATH /public/software/cuda-13.1/bin
prepend-path LD_LIBRARY_PATH /public/software/cuda-13.1/lib64

七、验证安装与功能测试

7.1 登录节点基础验证（无 GPU）

bash 复制代码

# 检查 Python 版本
/public/software/MLFF/envs/4090/mace/bin/python --version
# 期望输出: Python 3.11.x

# 检查 PyTorch 版本（注意：登录节点 CUDA 会显示 False，正常）
/public/software/MLFF/envs/4090/mace/bin/python -c \
    "import torch; print('torch:', torch.__version__, 'cuda_ver:', torch.version.cuda)"
# 期望输出: torch: 2.6.0+cu124 cuda_ver: 12.4
# ⚠️ 如果 cuda_ver 显示 None，说明装了 CPU 版本，需要 --force-reinstall 重装！

7.2 GPU 计算节点快速验证

bash 复制代码

# 切换到 4090 节点
export SSHPASS='your_password' && sshpass -e ssh -o StrictHostKeyChecking=no root@cn01

source /public/software/mamba/etc/profile.d/conda.sh
conda activate /public/software/MLFF/envs/4090/mace

python -c "import torch; print('CUDA:', torch.cuda.is_available(), torch.cuda.get_device_name(0))"
# 期望输出: CUDA: True NVIDIA GeForce RTX 4090

# 切换到 5090 节点
export SSHPASS='your_password' && sshpass -e ssh -o StrictHostKeyChecking=no root@cn10

conda activate /public/software/MLFF/envs/5090/mace
python -c "import torch; print('CUDA:', torch.cuda.is_available(), torch.cuda.get_device_name(0))"
# 期望输出: CUDA: True NVIDIA GeForce RTX 5090

7.3 SO3LR / JAX 验证

bash 复制代码

# 在 cn01 上
conda activate /public/software/MLFF/envs/4090/so3lr
python -c "import jax; print('jax:', jax.__version__); print(jax.devices())"
# 期望输出: jax: 0.5.3  [CudaDevice(id=0), CudaDevice(id=1)]

# 在 cn10 上
conda activate /public/software/MLFF/envs/5090/so3lr
python -c "import jax; print('jax:', jax.__version__); print(jax.devices())"
# 期望输出: jax: 0.5.3  [CudaDevice(id=0), CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3)]

7.4 各工具导入验证

bash 复制代码

# MACE
python -c "from mace.calculators import mace_off; print('MACE calculator OK')"

# Egret（使用本地模型）
python -c "from mace.calculators import mace_off; print('Egret calculator OK')"

# AIMNet2（仅在 5090 环境下）
python -c "from aimnet.calculators import AIMNet2ASE; print('AIMNet2 OK')"

# SO3LR
python -c "import so3lr; print('SO3LR OK')"

7.5 全模型单点能测试脚本

以下脚本位于 /public/software/MLFF/example/ 目录，用于对所有模型进行苯分子（C6H6，12 原子）单点能计算，验证安装正确性和 GPU 加速。

7.5.1 MACE 全模型测试 `example/mace/test_mace_all.py`

python 复制代码

#!/usr/bin/env python3
import time
import numpy as np
import torch
from ase.build import molecule
from mace.calculators import mace_off

MODELS = [
    "/public/software/MLFF/models/mace/MACE-OFF23_small.model",
    "/public/software/MLFF/models/mace/MACE-OFF23_medium.model",
    "/public/software/MLFF/models/mace/MACE-OFF23b_medium.model",
    "/public/software/MLFF/models/mace/MACE-OFF23_large.model",
    "/public/software/MLFF/models/mace/MACE-OFF24_medium.model",
]

print("=" * 50)
print("MACE All-Models Test (Benzene)")
print("=" * 50)
print(f"PyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
atoms = molecule("C6H6")
print(f"Benzene: {len(atoms)} atoms")

t_total_start = time.time()
for model_path in MODELS:
    name = model_path.rsplit("/", 1)[-1].replace(".model", "")
    try:
        t0 = time.time()
        calc = mace_off(model=model_path, default_dtype="float64")
        atoms.calc = calc
        e = atoms.get_potential_energy()
        f = atoms.get_forces()
        t1 = time.time()
        elapsed = t1 - t0
        print(f"  {name}: E={e:.4f} eV  MaxF={np.max(np.abs(f)):.4f}  "
              f"RMSF={np.sqrt(np.mean(f**2)):.4f}  Time={elapsed:.2f}s  OK")
    except Exception as exc:
        print(f"  {name}: FAILED - {exc}")

t_total = time.time() - t_total_start
print("=" * 50)
print(f"MACE all-models test COMPLETE! Total time: {t_total:.2f}s")

7.5.2 Egret 全模型测试 `example/egret/test_egret_all.py`

python 复制代码

#!/usr/bin/env python3
import time
import numpy as np
import torch
from ase.build import molecule
from mace.calculators import mace_off

MODELS = [
    "/public/software/MLFF/models/egret/EGRET_1.model",
    "/public/software/MLFF/models/egret/EGRET_1E.model",
    "/public/software/MLFF/models/egret/EGRET_1M.model",
    "/public/software/MLFF/models/egret/EGRET_1S.model",
    "/public/software/MLFF/models/egret/EGRET_1T.model",
]

print("=" * 50)
print("Egret All-Models Single-Point Energy Test (Benzene)")
print("=" * 50)
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
atoms = molecule("C6H6")
print(f"Molecule: benzene, {len(atoms)} atoms")

results = {}
t_total_start = time.time()
for model_path in MODELS:
    model_name = model_path.rsplit("/", 1)[-1].replace(".model", "")
    try:
        t0 = time.time()
        calc = mace_off(model=model_path, default_dtype="float64")
        atoms.calc = calc
        energy = atoms.get_potential_energy()
        forces = atoms.get_forces()
        t1 = time.time()
        elapsed = t1 - t0
        max_force = np.max(np.abs(forces))
        rms_force = np.sqrt(np.mean(forces**2))
        results[model_name] = {"energy": energy, "max_force": max_force,
                               "rms_force": rms_force, "time": elapsed, "status": "OK"}
        print(f"  {model_name}: Energy={energy:.6f} eV, MaxF={max_force:.6f}, "
              f"RMSF={rms_force:.6f}, Time={elapsed:.2f}s")
    except Exception as exc:
        results[model_name] = {"status": f"FAILED: {exc}", "time": 0}
        print(f"  {model_name}: FAILED - {exc}")

t_total = time.time() - t_total_start
print()
print("=" * 50)
passed = sum(1 for r in results.values() if r["status"] == "OK")
print(f"Egret: {passed}/{len(MODELS)} models passed")
for name, r in results.items():
    if r["status"] == "OK":
        print(f"  {name}: {r['energy']:.6f} eV ({r['time']:.2f}s)")
    else:
        print(f"  {name}: {r['status']}")
print("=" * 50)
print(f"Egret all-models test COMPLETE! Total time: {t_total:.2f}s")

7.5.3 AIMNet2 全模型测试 `example/aimnet2/test_aimnet2_all.py`

python 复制代码

#!/usr/bin/env python3
import time
import numpy as np
import torch

MODELS = [
    "/public/software/MLFF/models/aimnet2/aimnet2-wb97m-d3",
    "/public/software/MLFF/models/aimnet2/aimnet2-2025",
    "/public/software/MLFF/models/aimnet2/aimnet2-pd",
    "/public/software/MLFF/models/aimnet2/aimnet2-rxn",
    "/public/software/MLFF/models/aimnet2/aimnet2-nse",
]

print("=" * 50)
print("AIMNet2 All-Models Single-Point Energy Test (Benzene)")
print("=" * 50)
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

try:
    from aimnet.calculators import AIMNet2ASE, AIMNet2Calculator
    CAN_USE_AIMNET = True
    print("AIMNet2: import OK")
except ValueError:
    CAN_USE_AIMNET = False
    print("AIMNet2: import FAILED (PyTorch 2.6 incompatible)")
    print("AIMNet2 requires PyTorch >= 2.8.")
    print("=" * 50)
    print("AIMNet2 all-models test SKIPPED (incompatible)")
    import sys
    sys.exit(0)

from ase.build import molecule
atoms = molecule("C6H6")
print(f"Molecule: benzene, {len(atoms)} atoms")

results = {}
t_total_start = time.time()
for model_dir in MODELS:
    model_name = model_dir.rsplit("/", 1)[-1]
    try:
        t0 = time.time()
        base_calc = AIMNet2Calculator(model_dir)
        charge = 0
        mult = 1
        if model_name == "aimnet2-nse":
            from ase import Atoms
            atom_nums = atoms.get_atomic_numbers()
            atom_pos = atoms.get_positions()
            mol = Atoms(numbers=atom_nums, positions=atom_pos)
            mol.info["mult"] = 1
            calc = AIMNet2ASE(base_calc, charge=charge, mult=mult)
            mol.calc = calc
            energy = mol.get_potential_energy()
            forces = mol.get_forces()
        else:
            calc = AIMNet2ASE(base_calc, charge=charge, mult=mult)
            atoms.calc = calc
            energy = atoms.get_potential_energy()
            forces = atoms.get_forces()
        t1 = time.time()
        elapsed = t1 - t0
        max_force = np.max(np.abs(forces))
        rms_force = np.sqrt(np.mean(forces**2))
        results[model_name] = {"energy": energy, "max_force": max_force,
                               "rms_force": rms_force, "time": elapsed, "status": "OK"}
        print(f"  {model_name}: Energy={energy:.6f} eV, MaxF={max_force:.6f}, "
              f"RMSF={rms_force:.6f}, Time={elapsed:.2f}s")
    except Exception as exc:
        results[model_name] = {"status": f"FAILED: {exc}", "time": 0}
        print(f"  {model_name}: FAILED - {exc}")

t_total = time.time() - t_total_start
print()
print("=" * 50)
passed = sum(1 for r in results.values() if r["status"] == "OK")
print(f"AIMNet2: {passed}/{len(MODELS)} models passed")
for name, r in results.items():
    if r["status"] == "OK":
        print(f"  {name}: {r['energy']:.6f} eV ({r['time']:.2f}s)")
    else:
        print(f"  {name}: {r['status']}")
print("=" * 50)
print(f"AIMNet2 all-models test COMPLETE! Total time: {t_total:.2f}s")

7.5.4 SO3LR 测试 `example/so3lr/test_so3lr.py`

python 复制代码

#!/usr/bin/env python3
import time
import numpy as np
import jax
from ase.build import molecule
from so3lr.ase_utils import make_ase_calculator

print("=" * 50)
print("SO3LR Single-Point Energy Test (Benzene)")
print("=" * 50)
print(f"JAX: {jax.__version__}")
print(f"JAX devices: {jax.devices()}")
atoms = molecule("C6H6")
print(f"Molecule: benzene, {len(atoms)} atoms")

t0 = time.time()
calc = make_ase_calculator(dtype=np.float32)
atoms.calc = calc
energy = atoms.get_potential_energy()
forces = atoms.get_forces()
t1 = time.time()
elapsed = t1 - t0

print(f"Energy: {energy:.6f} eV")
print(f"Max force component: {np.max(np.abs(forces)): .6f} eV/Ang")
print(f"Force RMS: {np.sqrt(np.mean(forces**2)): .6f} eV/Ang")
print(f"Time: {elapsed:.2f}s")
print("=" * 50)
print("SO3LR test PASSED!")

7.5.5 一键批量运行脚本 `example/run_all_tests.sh`

bash 复制代码

#!/bin/bash
# Run all MLFF tests on compute nodes
# Usage: bash run_all_tests.sh [4090|5090]

set -e

GPU_TYPE="${1:-4090}"

source /public/software/mamba/etc/profile.d/conda.sh

BASE="/public/software/MLFF"
EXAMPLE="$BASE/example"

if [ "$GPU_TYPE" = "4090" ]; then
    ENV_BASE="$BASE/envs/4090"
elif [ "$GPU_TYPE" = "5090" ]; then
    ENV_BASE="$BASE/envs/5090"
else
    echo "Usage: bash run_all_tests.sh [4090|5090]"
    exit 1
fi

echo "=========================================="
echo "Running MLFF tests on $GPU_TYPE"
echo "=========================================="

echo ""
echo "=== 1. MACE ==="
conda activate "$ENV_BASE/mace"
python "$EXAMPLE/mace/test_mace_all.py"

echo ""
echo "=== 2. Egret ==="
conda activate "$ENV_BASE/egret"
python "$EXAMPLE/egret/test_egret_all.py"

echo ""
echo "=== 3. AIMNet2 ==="
conda activate "$ENV_BASE/aimnet2"
python "$EXAMPLE/aimnet2/test_aimnet2_all.py"

echo ""
echo "=== 4. SO3LR ==="
conda activate "$ENV_BASE/so3lr"
python "$EXAMPLE/so3lr/test_so3lr.py"

echo ""
echo "=========================================="
echo "All $GPU_TYPE tests completed!"
echo "=========================================="

7.5.6 远程执行方式

在管理节点上通过 ssh 执行计算节点上的测试：

bash 复制代码

# cn01 (4090) 全模型测试
export SSHPASS='your_password'
sshpass -e ssh -o StrictHostKeyChecking=no root@cn01 \
    'source /public/software/mamba/etc/profile.d/conda.sh && \
     conda activate /public/software/MLFF/envs/4090/mace && \
     python /public/software/MLFF/example/mace/test_mace_all.py && \
     conda activate /public/software/MLFF/envs/4090/egret && \
     python /public/software/MLFF/example/egret/test_egret_all.py && \
     conda activate /public/software/MLFF/envs/4090/so3lr && \
     python /public/software/MLFF/example/so3lr/test_so3lr.py'

# cn10 (5090) 全模型测试（含 AIMNet2）
sshpass -e ssh -o StrictHostKeyChecking=no root@cn10 \
    'source /public/software/mamba/etc/profile.d/conda.sh && \
     conda activate /public/software/MLFF/envs/5090/mace && \
     python /public/software/MLFF/example/mace/test_mace_all.py && \
     conda activate /public/software/MLFF/envs/5090/egret && \
     python /public/software/MLFF/example/egret/test_egret_all.py && \
     conda activate /public/software/MLFF/envs/5090/aimnet2 && \
     python /public/software/MLFF/example/aimnet2/test_aimnet2_all.py && \
     conda activate /public/software/MLFF/envs/5090/so3lr && \
     python /public/software/MLFF/example/so3lr/test_so3lr.py'

八、测试结果（实测数据）

以下为在两个计算节点上对苯分子（C6H6，12 原子）进行单点能计算的实测结果，包含 GPU 加速和计时数据。

8.1 cn01 - RTX 4090（2 GPU, PyTorch 2.6.0+cu124, JAX 0.5.3）

MACE（5/5 通过，总计 2.92s）

模型	能量 (eV)	最大力 (eV/Å)	RMS力 (eV/Å)	耗时
MACE-OFF23_small	-6324.1157	0.2277	0.1135	1.21s
MACE-OFF23_medium	-6324.1147	0.2304	0.1131	0.43s
MACE-OFF23b_medium	-6324.1098	0.2313	0.1138	0.87s
MACE-OFF23_large	-6324.1117	0.2310	0.1128	0.24s
MACE-OFF24_medium	-6324.1068	0.2280	0.1125	0.17s

Egret（5/5 通过，总计 5.44s）

模型	能量 (eV)	最大力 (eV/Å)	RMS力 (eV/Å)	耗时
EGRET_1	-6324.103547	0.229138	0.112314	1.68s
EGRET_1E	-6324.110436	0.228968	0.111664	0.99s
EGRET_1M	-6324.110599	0.230466	0.113030	0.92s
EGRET_1S	-6324.107964	0.232138	0.114035	0.95s
EGRET_1T	-6324.118287	0.228113	0.113092	0.90s

AIMNet2（⏭️ 跳过）

AIMNet2 在 4090 上因 PyTorch 2.6 不兼容而自动跳过（需要 PyTorch ≥ 2.8）。

SO3LR（1/1 通过，15.56s）

模型	能量 (eV)	最大力 (eV/Å)	RMS力 (eV/Å)	耗时	GPU 数
SO3LR	-20.774759	0.321396	0.138711	15.56s	2

8.2 cn10 - RTX 5090（4 GPU, PyTorch 2.11.0+cu128, JAX 0.5.3）

MACE（5/5 通过，总计 2.91s）

模型	能量 (eV)	最大力 (eV/Å)	RMS力 (eV/Å)	耗时
MACE-OFF23_small	-6324.1157	0.2277	0.1135	1.20s
MACE-OFF23_medium	-6324.1147	0.2304	0.1131	0.38s
MACE-OFF23b_medium	-6324.1098	0.2313	0.1138	0.97s
MACE-OFF23_large	-6324.1117	0.2310	0.1128	0.21s
MACE-OFF24_medium	-6324.1068	0.2280	0.1125	0.15s

Egret（5/5 通过，总计 4.70s）

模型	能量 (eV)	最大力 (eV/Å)	RMS力 (eV/Å)	耗时
EGRET_1	-6324.103547	0.229138	0.112314	1.73s
EGRET_1E	-6324.110436	0.228968	0.111664	0.95s
EGRET_1M	-6324.110599	0.230466	0.113030	0.78s
EGRET_1S	-6324.107964	0.232138	0.114035	0.54s
EGRET_1T	-6324.118287	0.228113	0.113092	0.70s

AIMNet2（5/5 通过，总计 5.10s）

模型	能量 (eV)	最大力 (eV/Å)	RMS力 (eV/Å)	耗时
aimnet2-wb97m-d3	-6324.128165	0.248503	0.118311	3.33s
aimnet2-2025	-6316.588580	0.172851	0.095963	1.30s
aimnet2-pd	-6316.680410	0.173732	0.088716	0.18s
aimnet2-rxn	-1.675890	0.235570	0.111718	0.20s
aimnet2-nse	-6324.155015	0.220431	0.116969	0.08s

注：aimnet2-rxn 的能量值为 -1.68 eV，与其他模型差异很大。这是因为 aimnet2-rxn 是反应模型，输出的是反应能而非绝对能量，属于正常行为。aimnet2-2025 和 aimnet2-pd 的能量约为 -6316 eV，与其他模型（约 -6324 eV）不同，这是因为它们使用了不同的 DFT 参考水平。

SO3LR（1/1 通过，12.30s）

模型	能量 (eV)	最大力 (eV/Å)	RMS力 (eV/Å)	耗时	GPU 数
SO3LR	-20.776371	0.320263	0.137987	12.30s	4

8.3 性能对比分析

4090 vs 5090 耗时对比

工具	4090 总耗时	5090 总耗时	加速比
MACE (5模型)	2.92s	2.91s	~1.0x
Egret (5模型)	5.44s	4.70s	1.16x
AIMNet2 (5模型)	⏭️ skip	5.10s	N/A
SO3LR	15.56s (2 GPU)	12.30s (4 GPU)	1.26x

分析

MACE / Egret：4090 和 5090 速度几乎持平。原因是苯分子太小（仅 12 原子），GPU 还没完全发挥计算能力就完成了，大部分时间花在模型加载（首次较慢）和 Python 解释器开销上。
AIMNet2：只能在 5090 上运行（需 PyTorch ≥ 2.8）。首次模型加载较慢（3.33s），后续推理极快（0.08s）。
SO3LR ：5090（4 GPU）比 4090（2 GPU）快约 26%。JAX 的多 GPU 并行效果明显，GPU 数量从 2 增加到 4 带来了显著加速。
不同工具间的能量差异：MACE/Egret/AIMNet2-wb97m-d3 的结果在 -6324 eV 附近一致，说明这些模型在 DFT 参考水平上类似。SO3LR 使用完全不同的框架和模型架构，能量值为 -20.78 eV，不可直接对比。

九、踩坑与注意事项汇总

9.1 PyTorch CPU 版本陷阱 ⚠️⚠️⚠️

最严重的坑！ pip 安装 PyTorch 时，如果环境中已有 conda-forge 安装的 numpy 等包，pip 可能自动选择 CPU 版本的 wheel（不带 +cu124 后缀），导致 torch.version.cuda = None。

解决方案 ：安装 PyTorch 时必须使用 --force-reinstall：

bash 复制代码

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \
    --index-url https://download.pytorch.org/whl/cu124 --force-reinstall

安装后验证：

bash 复制代码

python -c "import torch; print(torch.version.cuda)"
# 必须输出 "12.4"（或 "12.8"），不能是 None

9.2 Mamba Plugin 错误

mamba 命令偶尔会出现 An unexpected error has occurred. Conda has prepared the above report 错误，提示 plugin 问题。

解决方案 ：加 CONDA_NO_PLUGINS=true：

bash 复制代码

CONDA_NO_PLUGINS=true mamba create -y -p /path/to/env ...

9.3 SO3LR git 依赖网络问题 ⚠️⚠️

SO3LR 有 5 个子依赖从 GitHub git clone，且：

mlff 还嵌套依赖 jraph/jax-md/glp/jax-pme（与 so3lr 相同）
glp 还嵌套依赖 comms

必须提前准备这些包 ，并修改 pyproject.toml 将 git 依赖替换为本地路径。

需要修改的文件：

/public/software/MLFF/so3lr-main/pyproject.toml（5个依赖）
/public/software/MLFF/so3lr-deps/mlff/pyproject.toml（4个依赖）
/public/software/MLFF/so3lr-deps/glp/pyproject.toml（1个依赖：comms）

9.4 SO3LR 的 JAX 版本兼容性

so3lr 会自动将 JAX 从 0.10.x 降级到 0.5.3，但 jax-cuda12-plugin 和 jax-cuda12-pjrt 不会自动降级，导致版本不匹配：

复制代码

RuntimeWarning: JAX plugin jax_cuda12_plugin version 0.10.1 is installed,
but it is not compatible with the installed jaxlib version 0.5.3

在 GPU 节点上会导致 core dump。

解决方案：手动降级 plugin：

bash 复制代码

pip install jax-cuda12-plugin==0.5.3 jax-cuda12-pjrt==0.5.3

9.5 SO3LR editable install 不可移动

SO3LR 及其子依赖使用 pip install -e .（editable install），Python 运行时直接从源码目录加载代码。so3lr-main/ 和 so3lr-deps/ 目录绝对不能移动或删除！

9.6 ambertools / gromacs conda 安装冲突

conda-forge 的 ambertools 与 openff-toolkit 之间存在 icu / qt6-main / libboost 版本冲突，无法在同一环境中安装。

解决方案 ：不在 conda 环境中安装 ambertools 和 gromacs，通过系统 modulefile 引用已有的 /public/software/amber24 和 /public/software/gromacs-2026-beta。

9.7 cuequivariance 与 PyTorch CUDA 库版本冲突

mace-torch 安装的 cuequivariance-ops-torch-cu12 要求 nvidia-cublas-cu12 >= 12.5.0，而 PyTorch cu124 自带的 nvidia-cublas-cu12 是 12.4.x。

实际影响 ：无。这只是 pip 的版本声明冲突，cuequivariance 在 PyTorch cu124 下实际运行正常。日志中出现的 Error while loading libcue_ops.so: undefined symbol: cublasGemmGroupedBatchedEx 警告可以安全忽略。

9.8 AIMNet2 与 PyTorch 版本严格不兼容 ⚠️⚠️

AIMNet2 使用了 torch.library.custom_op 装饰器中的 list[torch.Tensor] 返回类型语法。这个语法特性在 PyTorch 2.8+ 才引入，而 RTX 4090 环境最高只能使用 PyTorch 2.6.0+cu124。

错误现象：

python 复制代码

ValueError: Unsupported type annotation list[torch.Tensor]

解决方案：AIMNet2 只能在 5090（PyTorch 2.11+cu128）上使用。4090 环境中的测试脚本会自动检测并跳过（graceful skip）。

9.9 AIMNet2 缺少 safetensors 依赖

aimnet 的 pip 包未声明 safetensors 为依赖，但加载模型时需要它。缺少此包会导致 ModuleNotFoundError: No module named 'safetensors'。

解决方案：

bash 复制代码

pip install safetensors

9.10 登录节点无 GPU

登录节点没有 GPU，以下现象是正常的：

torch.cuda.is_available() 返回 False
JAX 报 cuInit(0) failed 错误
必须在 cn01/cn10 计算节点上测试 GPU 功能

9.11 已有本地安装包优先使用

在 package/ 目录下已下载的 wheel 包 / 源码包应该优先使用 pip install /path/to/package.whl 安装，而不是重新从网络下载。这样既能避免网络问题，又能确保版本一致。

十、使用方式

bash 复制代码

# 加载环境（通过 module）
module load MLFF/4090-mace      # 在 cn01 (4090) 上
module load MLFF/5090-so3lr     # 在 cn10 (5090) 上

# 或直接 conda activate
conda activate /public/software/MLFF/envs/4090/mace
conda activate /public/software/MLFF/envs/5090/so3lr

# 使用 MACE 计算（Python 示例）
python -c "
from mace.calculators import mace_off
from ase.build import molecule
atoms = molecule('C6H6')
calc = mace_off(model='/public/software/MLFF/models/mace/MACE-OFF23_medium.model',
                default_dtype='float64')
atoms.calc = calc
print('Energy:', atoms.get_potential_energy(), 'eV')
"

# 使用 SO3LR 计算（Python 示例）
python -c "
from so3lr.ase_utils import make_ase_calculator
from ase.build import molecule
import numpy as np
atoms = molecule('C6H6')
calc = make_ase_calculator(dtype=np.float32)
atoms.calc = calc
print('Energy:', atoms.get_potential_energy(), 'eV')
"

十一、文件清单与不可移动项

目录	是否可移动	说明
`envs/`	❌ 不可	conda 环境核心（8 个环境，约 30-50 GB）
`models/`	❌ 不可	预训练模型文件
`so3lr-main/`	❌ 不可	SO3LR editable install 源码
`so3lr-deps/`	❌ 不可	SO3LR editable install 子依赖源码
`example/`	❌ 不建议	测试脚本和一键运行脚本
`package/`	✅ 可整体搬走	原始安装包，安装完成后可备份到别处
`egret-public-master/`	✅ 可移	已复制模型到 models/，源码可移到 package/
`mace-off-main/`	✅ 可移	已复制模型到 models/，源码可移到 package/

附录：GPU 节点硬件配置

节点	GPU 型号	GPU 数量	CUDA 驱动	系统 CUDA
cn01	NVIDIA GeForce RTX 4090	2	12.4	`/public/software/cuda-12.5.0`
cn10	NVIDIA GeForce RTX 5090	4	12.8	`/public/software/cuda-13.1`

MLFF 环境安装教程：MACE / Egret / AIMNet2 / SO3LR