使用的系统:SCNet DCU ,版本dcu25.04
先上结论,cupy这个软件包没装上去....所以升级失败
首先确认系统系统
lsb_release -a
是
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy
去找DTK-25.04.2 ubuntu22.40
这是系统:
https://download.sourcefind.cn:65024/1/main/DTK-25.04.2/Ubuntu22.04
生态包:
https://download.sourcefind.cn:65024/4/main/
dash 1.7的,啥意思啊
# torch2.51
https://download.sourcefind.cn:65024/directlink/4/pytorch/DAS1.7/torch-2.5.1+das.opt1.dtk25042-cp310-cp310-manylinux_2_28_x86_64.whl
# torch2.71
https://download.sourcefind.cn:65024/directlink/4/pytorch/DAS1.7/torch-2.7.1+das.opt1.dtk25042-cp310-cp310-manylinux_2_28_x86_64.whl
lsm
https://download.sourcefind.cn:65024/directlink/4/lmslim/DAS1.7/lmslim-0.3.1+das.dtk25042-cp310-cp310-manylinux_2_28_x86_64.whl
vllm
https://download.sourcefind.cn:65024/directlink/4/vllm/DAS1.7/vllm-0.9.2+das.opt1.dtk25042-cp310-cp310-manylinux_2_28_x86_64.whl
lighttop
https://download.sourcefind.cn:65024/directlink/4/lightop/DAS1.7/lightop-0.6.0+das.dtk25042-cp310-cp310-manylinux_2_28_x86_64.whl
transformer
这个不管用
https://download.sourcefind.cn:65024/directlink/4/transformer_engine/DAS1.7/transformer_engine-2.5.0+das.opt1.dtk25042-cp310-cp310-manylinux_2_28_x86_64.whl
用这个
pip install transformer -U
cupy这个硬骨头
export CUPY_INSTALL_USE_HIP=1
export ROCM_HOME=/opt/rocm
export HCC_AMDGPU_TARGET=gfx906
pip install cupy
安装hipcub
git clone https://github.com/ROCmSoftwarePlatform/hipCUB.git
cd hipCUB
mkdir build && cd build
cmake ..
make -j$(nproc)
sudo make install
cmake .. -DCMAKE_CXX_COMPILER=/opt/dtk/bin/hipcc # 显式指定编译器
make -j
也不知道这样是安装好了不?
-- Up-to-date: /opt/rocm/include/
-- Up-to-date: /opt/rocm/include//hipcub
-- Installing: /opt/rocm/include//hipcub/hipcub_version.hpp
-- Installing: /opt/rocm/lib/cmake/hipcub/hipcub-targets.cmake
-- Installing: /opt/rocm/lib/cmake/hipcub/hipcub-config.cmake
-- Installing: /opt/rocm/lib/cmake/hipcub/hipcub-config-version.cmake
-- Installing: /opt/rocm/share/doc/hipcub/LICENSE.txt
dcu24.04
先安装hipcub
git clone https://github.com/ROCmSoftwarePlatform/hipCUB.git
cd hipCUB
mkdir build && cd build
cmake .. -DCMAKE_CXX_COMPILER=/opt/dtk/bin/hipcc # 显式指定编译器
make -j$(nproc)
make install
安装cupy
export CUPY_INSTALL_USE_HIP=1
export ROCM_HOME=/opt/dtk
# export HCC_AMDGPU_TARGET=gfx906
pip install cupy
如果不行,就安装cupy12.3版本。
设置:export HCC_AMDGPU_TARGET=gfx942
安装相关库,并安装vllm
wget https://download.sourcefind.cn:65024/directlink/4/pytorch/DAS1.7/torch-2.5.1+das.opt1.dtk25042-cp310-cp310-manylinux_2_28_x86_64.whl
pip install torch-2.5.1+das.opt1.dtk25042-cp310-cp310-manylinux_2_28_x86_64.whl
wget https://download.sourcefind.cn:65024/directlink/4/lightop/DAS1.7/lightop-0.6.0+das.dtk25042-cp310-cp310-manylinux_2_28_x86_64.whl
pip install lightop-0.6.0+das.dtk25042-cp310-cp310-manylinux_2_28_x86_64.whl
wget https://download.sourcefind.cn:65024/directlink/4/vllm/DAS1.7/vllm-0.9.2+das.opt1.dtk25042-cp310-cp310-manylinux_2_28_x86_64.whl
pip install vllm-0.9.2+das.opt1.dtk25042-cp310-cp310-manylinux_2_28_x86_64.whl
最后还是没升级成功。
调试
报错 Exception: Please install hipCUB and retry
raise Exception('Please install hipCUB and retry')
Exception: Please install hipCUB and retry
尝试编译安装
编译的时候报错
-- System architecture is x86_64
CMake Error at cmake/VerifyCompiler.cmake:39 (message):
On ROCm platform 'hipcc' or HIP-aware Clang must be used as C++ compiler.
Call Stack (most recent call first):
CMakeLists.txt:124 (include)
-- Configuring incomplete, errors occurred!
make: *** No targets specified and no makefile found. Stop.