使用 PaddleNLP 在 CPU(支持 AVX 指令)下跑通 llama2-7b或DeepSeek-r1:1.5b 模型(完成度80%)

原文:🚣‍♂️ 使用 PaddleNLP 在 CPU(支持 AVX 指令)下跑通 llama2-7b 模型 🚣 --- PaddleNLP 文档

使用 PaddleNLP 在 CPU(支持 AVX 指令)下跑通 llama2-7b 模型 🚣

PaddleNLP 在支持 AVX 指令的 CPU 上对 llama 系列模型进行了深度适配和优化,此文档用于说明在支持 AVX 指令的 CPU 上使用 PaddleNLP 进行 llama 系列模型进行高性能推理的流程。

检查硬件:

芯片类型 GCC 版本 cmake 版本
Intel(R) Xeon(R) Platinum 8463B 9.4.0 >=3.18

注:如果要验证您的机器是否支持 AVX 指令,只需系统环境下输入命令,看是否有输出:

复制代码
lscpu | grep -o -P '(?<!\w)(avx\w*)'

# 显示如下结果 -
avx
avx2
**avx512f**
avx512dq
avx512ifma
avx512cd
**avx512bw**
avx512vl
avx_vnni
**avx512_bf16**
avx512vbmi
avx512_vbmi2
avx512_vnni
avx512_bitalg
avx512_vpopcntdq
**avx512_fp16**

环境准备:

1 安装 numactl

复制代码
apt-get update
apt-get install numactl

2 安装 paddle

2.1 源码安装:
复制代码
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle && mkdir build && cd build

cmake .. -DPY_VERSION=3.8 -DWITH_GPU=OFF

make -j128
pip install -U python/dist/paddlepaddle-0.0.0-cp38-cp38-linux_x86_64.whl
2.2 pip 安装:
复制代码
python -m pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/
2.3 检查是否安装正常:
复制代码
python -c "import paddle; paddle.version.show()"
python -c "import paddle; paddle.utils.run_check()"

3 克隆 PaddleNLP 仓库代码,并安装依赖

复制代码
# PaddleNLP是基于paddlepaddle『飞桨』的自然语言处理和大语言模型(LLM)开发库,存放了基于『飞桨』框架实现的各种大模型,llama系列模型也包含其中。为了便于您更好地使用PaddleNLP,您需要clone整个仓库。
pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html

4 安装第三方库和 paddlenlp_ops

复制代码
# PaddleNLP仓库内置了专用的融合算子,以便用户享受到极致压缩的推理成本
git clone https://github.com/PaddlePaddle/PaddleNLP.git
cd PaddleNLP/csrc/cpu
sh setup.sh

5 第三方库安装失败

复制代码
#如果oneccl安装失败 建议在gcc 8.2-9.4之间重新安装
cd csrc/cpu/xFasterTransformer/3rdparty/
sh prepare_oneccl.sh

#如果xFasterTransformer 安装失败,建议在gcc 9.2以上重新安装
cd csrc/cpu/xFasterTransformer/build/
make -j24

#更多命令和环境变量可参考csrc/cpu/setup.sh

Cpu 高性能推理

PaddleNLP 还提供了基于 intel/xFasterTransformer 的 CPU 高性能推理,目前支持 FP16、BF16、INT8多种精度推理,以及 Prefill 基于 FP16,Decode 基于 INT8混合方式推理。

非 HBM 机器高性能推理参考:

1 确定 OMP_NUM_THREADS
复制代码
OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}')
2 动态图推理
复制代码
cd ../../llm/
#2.动态图推理 高性能 AVX 动态图模型推理命令参考
OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}') numactl -N 0  -m 0 python ./predict/predictor.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --dtype float32 --avx_mode --avx_type "fp16_int8" --device "cpu"
3 静态图推理
复制代码
#step1 : 静态图导出
python ./predict/export_model.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --output_path ./inference --dtype float32 --avx_mode --avx_type "fp16_int8" --device "cpu"
#step2: 静态图推理
OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}') numactl -N 0  -m 0 python ./predict/predictor.py --model_name_or_path ./inference --inference_model --dtype "float32" --mode "static" --device "cpu" --avx_mode

HBM 机器高性能推理参考:

1 硬件和 OMP_NUM_THREADS 确认
复制代码
#理论上HBM机器比非HBM机器nexttoken时延具有1.3倍-1.9倍的加速
#确认机器具有 hbm
lscpu
#如 node2、node3表示支持 hbm
$NUMA node0 CPU(s):                  0-31,64-95
$NUMA node1 CPU(s):                  32-63,96-127
$NUMA node2 CPU(s):
$NUMA node3 CPU(s):

#确定OMP_NUM_THREADS
lscpu | grep "Socket(s)" | awk -F ':' '{print $2}'
OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}')
2 动态图推理
复制代码
cd ../../llm/
# 高性能 AVX 动态图模型推理命令参考
FIRST_TOKEN_WEIGHT_LOCATION=0 NEXT_TOKEN_WEIGHT_LOCATION=2 OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}') numactl -N 0  -m 0 python ./predict/predictor.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --dtype float32 --avx_mode --avx_type "fp16_int8" --device "cpu"
注:FIRST_TOKEN_WEIGHT_LOCATION和NEXT_TOKEN_WEIGHT_LOCATION表示first_token权重放在numa0,next_token权重放在numa2(hbm缓存节点)。
3 静态图推理
复制代码
# 高性能静态图模型推理命令参考
# step1 : 静态图导出
python ./predict/export_model.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --output_path ./inference --dtype float32 --avx_mode --avx_type "fp16_int8" --device "cpu"
# step2: 静态图推理
FIRST_TOKEN_WEIGHT_LOCATION=0 NEXT_TOKEN_WEIGHT_LOCATION=2 OMP_NUM_THREADS=$(lscpu | grep "Core(s) per socket" | awk -F ':' '{print $2}') numactl -N 0  -m 0 python ./predict/predictor.py --model_name_or_path ./inference --inference_model --dtype "float32" --mode "static" --device "cpu" --avx_mode

快速实践

安装

安装库

复制代码
sudo apt update
sudo apt install numactl

看看cpu是否支持avx

复制代码
lscpu | grep -o -P '(?<!\w)(avx\w*)'

安装飞桨

复制代码
pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/

验证安装好飞桨

复制代码
python -c "import paddle; paddle.version.show()"
python -c "import paddle; paddle.utils.run_check()"

安装PaddleNLP库

复制代码
pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html

下载PaddleNLP源码并

安装加速算子

复制代码
git clone https://github.com/PaddlePaddle/PaddleNLP.git
cd PaddleNLP/csrc/cpu
sh setup.sh

编译失败

复制代码
Successfully installed intel-cmplr-lib-ur-2024.2.1 intel-openmp-2024.2.1 mkl-include-2024.0.0 mkl-static-2024.0.0 tbb-2021.13.1
CMake Error at CMakeLists.txt:129 (find_package):
  By not providing "FindoneCCL.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "oneCCL", but
  CMake did not find one.

  Could not find a package configuration file provided by "oneCCL" with any
  of the following names:

    oneCCLConfig.cmake
    oneccl-config.cmake

  Add the installation prefix of "oneCCL" to CMAKE_PREFIX_PATH or set
  "oneCCL_DIR" to a directory containing one of the above files.  If "oneCCL"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!
make: *** No targets specified and no makefile found.  Stop.

到oneccl子目录,重新编译下试试

复制代码
(py312) skywalk@DESKTOP-9C5AU01:~/github/PaddleNLP/csrc/cpu$ cd xFasterTransformer/3rdparty/
(py312) skywalk@DESKTOP-9C5AU01:~/github/PaddleNLP/csrc/cpu/xFasterTransformer/3rdparty$ sh prepare_oneccl.sh

还是失败,看文档说gcc版本在8-9之间比较好,而当前是13.3 ,版本有点高,就先搁置吧

现在的情况是:自己本机编译失败,星河社区github连接太慢导致编译失败,kaggle编译也失败。

再次安装加速算子

先添加Ubuntu的intel cpu库

复制代码
# 下载基础工具包
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
sudo apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
echo "deb https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update

# 安装完整开发套件(包含oneCCL)
sudo apt install intel-oneapi-ccl intel-oneapi-ccl-devel intel-oneapi-runtime-dnnl

再安装

复制代码
cd PaddleNLP/csrc/cpu && oneCCL_DIR=/opt/intel/oneapi/ccl/latest/lib/cmake/oneCCL sh setup.sh

推理

到PaddleNLP/llm 这个目录,执行:

复制代码
python ./predict/predictor.py --model_name_or_path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --inference_model --dtype float32 --avx_mode --avx_type "fp16_int8" --device "cpu"

总结

坑比预想的多,目前还没通。

调试

报错This system does not support NUMA policy

OMP_NUM_THREADS=(lscpu \| grep "Core(s) per socket" \| awk -F ':' '{print 2}') numactl -N 0 -m 0 python ./predict/predictor.py --model_name_or_path meta-llama/Llama-2-7b-chat --inference_model --dtype float32 --avx_mode --avx_type "fp16_int8" --device "cpu"

numactl: This system does not support NUMA policy

那就不用numactl了

报错:ModuleNotFoundError: No module named 'paddlenlp_ops'

from paddlenlp_ops import (

ModuleNotFoundError: No module named 'paddlenlp_ops'

看来不编译 paddlenlp_ops不行啊!

在kaggle 编译paddlenlp_ops报错

cd xFasterTransformer/3rdparty/

复制代码
!cd PaddleNLP/csrc/cpu/xFasterTransformer/3rdparty && sh prepare_oneccl.sh

再试最后一次,不行就撤。 单独编译oneccl过了,但是再编译paddlenlp还是报错

复制代码
-- MKL directory already exists. Skipping installation.
CMake Error at CMakeLists.txt:129 (find_package):
  By not providing "FindoneCCL.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "oneCCL", but
  CMake did not find one.

  Could not find a package configuration file provided by "oneCCL" with any
  of the following names:

    oneCCLConfig.cmake
    oneccl-config.cmake

  Add the installation prefix of "oneCCL" to CMAKE_PREFIX_PATH or set
  "oneCCL_DIR" to a directory containing one of the above files.  If "oneCCL"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!
make: *** No targets specified and no makefile found.  Stop.

在kaggle里,也不知道该怎么操作了....放弃

本机编译报错

-- MKL directory already exists. Skipping installation.

CMake Error at CMakeLists.txt:129 (find_package):

By not providing "FindoneCCL.cmake" in CMAKE_MODULE_PATH this project has

asked CMake to find a package configuration file provided by "oneCCL", but

CMake did not find one.

Could not find a package configuration file provided by "oneCCL" with any

of the following names:

oneCCLConfig.cmake

oneccl-config.cmake

Add the installation prefix of "oneCCL" to CMAKE_PREFIX_PATH or set

"oneCCL_DIR" to a directory containing one of the above files. If "oneCCL"

provides a separate development package or SDK, be sure it has been

installed.

-- Configuring incomplete, errors occurred!

直接pip 安装试试

复制代码
pip install oneccl

报错依旧

安装这个试试

复制代码
sudo apt install libdnnl3

尝试新的方法

复制代码
# 下载基础工具包
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
sudo apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
echo "deb https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update

# 安装完整开发套件(包含oneCCL)
sudo apt install intel-oneapi-ccl intel-oneapi-ccl-devel

本机这边非常慢,kaggle那边也不算快

12% [4 intel-oneapi-mpi-2021.14 7797 kB/45.6 MB 17%] 23.0 kB/s 1h 14min 51s

kaggle那边已经装好了,现在可以编译ops了

复制代码
!cd PaddleNLP/csrc/cpu && oneCCL_DIR=/opt/intel/oneapi/ccl/latest/ sh setup.sh

编译的时候有这样的报错

复制代码
warnings.warn(warning_message)
/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See Why you shouldn't invoke setup.py directly for details.
        ********************************************************************************

!!
  self.initialize_options()
/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

kaggle最后这样报错/usr/bin/ld: cannot find -l:libxfastertransformer.so: No such file or directory

复制代码
/usr/bin/ld: cannot find /kaggle/working/PaddleNLP/csrc/cpu/build/paddlenlp_ops/lib.linux-x86_64-cpython-310/avx_weight_only.o: No such file or directory
/usr/bin/ld: cannot find /kaggle/working/PaddleNLP/csrc/cpu/build/paddlenlp_ops/lib.linux-x86_64-cpython-310/stop_generation_multi_ends.o: No such file or directory
/usr/bin/ld: cannot find -l:libxfastertransformer.so: No such file or directory
/usr/bin/ld: cannot find -l:libxft_comm_helper.so: No such file or directory
collect2: error: ld returned 1 exit status
error: command '/usr/bin/x86_64-linux-gnu-g++' failed with exit code 1

发现是这里:

复制代码
-- Using src='https://github.com/google/sentencepiece/releases/download/v0.1.99/sentencepiece-0.1.99.tar.gz'
/kaggle/working/PaddleNLP/csrc/cpu/xFasterTransformer/src/comm_helper/comm_helper.cpp:17:10: fatal error: oneapi/ccl.hpp: No such file or directory
   17 | #include "oneapi/ccl.hpp"
      |          ^~~~~~~~~~~~~~~~

也就是oneapi

找到原因了,原来刚才设置的路径不对

复制代码
# 标准Intel oneAPI路径(Linux)
export oneCCL_DIR=/opt/intel/oneapi/ccl/latest/lib/cmake/ccl

# 自定义安装路径
export oneCCL_DIR=/your/custom/path/lib/cmake/ccl

# 运行CMake时注入变量
cmake -DoneCCL_DIR=$oneCCL_DIR ..

应该用这句:

复制代码
!cd PaddleNLP/csrc/cpu && oneCCL_DIR=/opt/intel/oneapi/ccl/latest/lib/cmake/oneCCL sh setup.sh

还是报错,应该再装这个:

复制代码
sudo apt install intel-oneapi-runtime-dnnl

kaggle报错:Your notebook tried to allocate more memory than is available. It has restarted.(放弃)

这个没办法了,就是超限了

kaggle放弃

本机编译时报错: status_string: "Failure when receiving data from the peer"

-- Using src='https://github.com/oneapi-src/oneDNN/releases/download/v0.21/mklml_lnx_2019.0.5.20190502.tgz'

Cloning into 'oneccl'...

CMake Error at /home/skywalk/github/PaddleNLP/csrc/cpu/xFasterTransformer/build/xdnn_lib-prefix/src/xdnn_lib-stamp/download-xdnn_lib.cmake:170 (message):

Each download failed!

error: downloading 'https://github.com/intel/xFasterTransformer/releases/download/IntrinsicGemm/xdnn_v1.5.2.tar.gz' failed

status_code: 56

status_string: "Failure when receiving data from the peer"

log:

--- LOG BEGIN ---

Host github.com:443 was resolved.

IPv6: (none)

IPv4:CMake Error at /home/skywalk/github/PaddleNLP/csrc/cpu/xFasterTransformer/build/examples/cpp/cmdline-prefix/src/cmdline-stamp/do 20.205.243.166

Trying 20.205.243.166:443...

Connected to github.com (20.205.243.166) port 443

ALPN: curl offers h2,hwnload-cmdline.cmake:170 (message):

Each download failed!

error: downloading 'https://github.com/tanakh/cmdline/archive/rttp/1.1

5 bytes data

TLSv1.3 (OUT), TLS handshake, Client hello (1):

512 bytes data

5 bytes data

TLSv1.3 (Iefs/heads/master.zip' failed

status_code: 56

status_string: "Failure when receiving data from the peer"

N), TLS handshake, Server hello (2):

可能就是github抽风吧

暂时搁置

还可能需要的一些库:

sudo apt install libdnnl-dev

sudo apt install intel-oneapi-mkl

sudo apt install libmkl-vml-avx libmkl-dev intel-oneapi-runtime-mkl

安装intel-mkl # 数学库的时候出来提示

intel-mkl # 数学库

出来提示:

┌─────────────────────────────────────┤ Intel Math Kernel Library (Intel MKL) ├─────────────────────────────────────┐

│ │

│ Intel MKL's Single Dynamic Library (SDL) is installed on your machine. This shared object can be used as an │

│ alternative to both libblas.so.3 and liblapack.so.3, so that packages built against BLAS/LAPACK can directly use │

│ MKL without rebuild. │

│ │

│ However, MKL is non-free software, and in particular its source code is not publicly available. By using MKL as │

│ the default BLAS/LAPACK implementation, you might be violating the licensing terms of copyleft software that │

│ would become dynamically linked against it. Please verify that the licensing terms of the program(s) that you │

│ intend to use with MKL are compatible with the MKL licensing terms. For the case of software under the GNU │

│ General Public License, you may want to read this FAQ: │

│ │

https://www.gnu.org/licenses/gpl-faq.html#GPLIncompatibleLibs

│ │

│ │

│ If you don't know what MKL is, or unwilling to set it as default, just choose the preset value or simply type │

│ Enter. │

│ │

│ Use libmkl_rt.so as the default alternative to BLAS/LAPACK? │

│ │

│ <Yes> <No> │

│ │

也就是这个库需要单独的许可?

相关推荐
Mintopia11 分钟前
OpenClaw 对软件行业产生的影响
人工智能
陈广亮1 小时前
构建具有长期记忆的 AI Agent:从设计模式到生产实践
人工智能
会写代码的柯基犬1 小时前
DeepSeek vs Kimi vs Qwen —— AI 生成俄罗斯方块代码效果横评
人工智能·llm
Mintopia1 小时前
OpenClaw 是什么?为什么节后热度如此之高?
人工智能
爱可生开源社区2 小时前
DBA 的未来?八位行业先锋的年度圆桌讨论
人工智能·dba
叁两4 小时前
用opencode打造全自动公众号写作流水线,AI 代笔太香了!
前端·人工智能·agent
敏编程4 小时前
一天一个Python库:jsonschema - JSON 数据验证利器
python
前端付豪5 小时前
LangChain记忆:通过Memory记住上次的对话细节
人工智能·python·langchain
strayCat232555 小时前
Clawdbot 源码解读 7: 扩展机制
人工智能·开源