使用BigDL-LLM优化大语言模型

BigDL-LLM是一个用于在英特尔XPU上使用INT4/FP4/INT8/FP8运行LLM(大型语言模型)的库,具有非常低的延迟(适用于任何PyTorch模型)。

在英特尔CPU上安装BigDL-LLM

安装

bash 复制代码
pip install --pre --upgrade bigdl-llm[all]

运行模型

python 复制代码
#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

#run the optimized model on CPU
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
input_ids = tokenizer.encode(input_str, ...)
output_ids = model.generate(input_ids, ...)
output = tokenizer.batch_decode(output_ids)

在英特尔GPU上安装BigDL-LLM

安装

bash 复制代码
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu

运行模型

python 复制代码
#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

#run the optimized model on Intel GPU
model = model.to('xpu')

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
input_ids = tokenizer.encode(input_str, ...)
input_ids = input_ids.to('xpu')
output_ids = model.generate(input_ids, ...)
output = tokenizer.batch_decode(output_ids.cpu())

使用BigDL-LLM优化Baichuan2

CPU优化方案

使用conda安装

bash 复制代码
conda create -n llm python=3.9
conda activate llm

pip install bigdl-llm[all] # install bigdl-llm with 'all' option
pip install transformers_stream_generator

运行

python 复制代码
import torch

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Generate predicted tokens
with torch.inference_mode():
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    output = model.generate(input_ids, max_new_tokens=args.n_predict)
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)

GPU方案

使用conda安装

bash 复制代码
conda create -n llm python=3.9
conda activate llm
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install transformers_stream_generator

配置OneAPI环境变量

windows:

bash 复制代码
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

Linux:

bash 复制代码
source /opt/intel/oneapi/setvars.sh

为了在Arc上获得最佳性能,建议配置几个环境变量

bash 复制代码
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

运行

python 复制代码
import torch
import intel_extension_for_pytorch as ipex

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True)
model = model.to('xpu')

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
    
# Generate predicted tokens
with torch.inference_mode():
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    input_ids = input_ids.to('xpu')
    # ipex model needs a warmup, then inference time can be accurate
    output = model.generate(input_ids, max_new_tokens=args.n_predict)

    # start inference
    output = model.generate(input_ids, max_new_tokens=args.n_predict)
    torch.xpu.synchronize()
    output = output.cpu()
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)
相关推荐
x-cmd1 小时前
[x-cmd] Firefox 148 发布 AI 开关,支持一键禁用 AI 功能
人工智能·ai·firefox·agent·x-cmd
Python大数据分析@1 小时前
MiniMax M2.5模型正式上线,是否真正实现“生产力SOTA ”与“低负担”,如何评价其表现?
ai
2501_948114242 小时前
OpenClaw 架构进阶:无缝接入星链4SAPI 替代官方网关的完整工程指南
ai·架构
钰珠AIOT3 小时前
本地部署 OpenClaw + DeepSeek-R1 完全指南
ai
lynn-fish3 小时前
标讯 “加速度”:AI 如何重构电力招投标的决策与效率
人工智能·ai·电网·电力·招投标·标讯
鹓于4 小时前
OpenClaw:让AI直接操控你的电脑
ai
發糞塗牆4 小时前
【Azure 架构师学习笔记 】- Azure AI(8)-Azure AI Foundry
人工智能·ai·azure
~kiss~4 小时前
大模型之OpenClaw介绍-开源自主AI执行代理(AI Agent)
ai
BugShare4 小时前
阿里千问又又翻车了—生成违规图片
安全·ai
virtaitech4 小时前
趋动科技 OrionX 社区版永久免费:重塑 AI 算力格局的“胜负手”
人工智能·科技·ai·gpu·池化技术