使用BigDL-LLM优化大语言模型

BigDL-LLM是一个用于在英特尔XPU上使用INT4/FP4/INT8/FP8运行LLM(大型语言模型)的库,具有非常低的延迟(适用于任何PyTorch模型)。

在英特尔CPU上安装BigDL-LLM

安装

bash 复制代码
pip install --pre --upgrade bigdl-llm[all]

运行模型

python 复制代码
#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

#run the optimized model on CPU
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
input_ids = tokenizer.encode(input_str, ...)
output_ids = model.generate(input_ids, ...)
output = tokenizer.batch_decode(output_ids)

在英特尔GPU上安装BigDL-LLM

安装

bash 复制代码
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu

运行模型

python 复制代码
#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

#run the optimized model on Intel GPU
model = model.to('xpu')

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
input_ids = tokenizer.encode(input_str, ...)
input_ids = input_ids.to('xpu')
output_ids = model.generate(input_ids, ...)
output = tokenizer.batch_decode(output_ids.cpu())

使用BigDL-LLM优化Baichuan2

CPU优化方案

使用conda安装

bash 复制代码
conda create -n llm python=3.9
conda activate llm

pip install bigdl-llm[all] # install bigdl-llm with 'all' option
pip install transformers_stream_generator

运行

python 复制代码
import torch

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Generate predicted tokens
with torch.inference_mode():
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    output = model.generate(input_ids, max_new_tokens=args.n_predict)
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)

GPU方案

使用conda安装

bash 复制代码
conda create -n llm python=3.9
conda activate llm
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install transformers_stream_generator

配置OneAPI环境变量

windows:

bash 复制代码
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

Linux:

bash 复制代码
source /opt/intel/oneapi/setvars.sh

为了在Arc上获得最佳性能,建议配置几个环境变量

bash 复制代码
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

运行

python 复制代码
import torch
import intel_extension_for_pytorch as ipex

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True)
model = model.to('xpu')

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
    
# Generate predicted tokens
with torch.inference_mode():
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    input_ids = input_ids.to('xpu')
    # ipex model needs a warmup, then inference time can be accurate
    output = model.generate(input_ids, max_new_tokens=args.n_predict)

    # start inference
    output = model.generate(input_ids, max_new_tokens=args.n_predict)
    torch.xpu.synchronize()
    output = output.cpu()
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)
相关推荐
鱼满满记8 小时前
1.6K+ Star!GenAIScript:一个可自动化的GenAI脚本环境
人工智能·ai·github
manfulshark9 小时前
OPENAI官方prompt文档解析
ai·prompt
阿_旭9 小时前
基于YOLO11/v10/v8/v5深度学习的维修工具检测识别系统设计与实现【python源码+Pyqt5界面+数据集+训练代码】
人工智能·python·深度学习·qt·ai
NETFARMER运营坛10 小时前
如何优化 B2B 转化率?这些步骤你不可不知
大数据·安全·阿里云·ai·ai写作
AI原吾20 小时前
探索 Python 图像处理的瑞士军刀:Pillow 库
图像处理·python·ai·pillow
探索云原生21 小时前
GPU 环境搭建指南:如何在裸机、Docker、K8s 等环境中使用 GPU
ai·云原生·kubernetes·go·gpu
AI原吾21 小时前
`psdparse`:解锁Photoshop PSD文件的Python密钥
python·ui·ai·photoshop·psdparse
HuggingAI1 天前
stable diffusion 大模型
人工智能·ai·stable diffusion·ai绘画
DogDaoDao1 天前
深度学习常用开源数据集介绍【持续更新】
图像处理·人工智能·深度学习·ai·数据集
卡洛驰1 天前
交叉熵损失函数详解
人工智能·深度学习·算法·机器学习·ai·分类·概率论