使用BigDL-LLM优化大语言模型

BigDL-LLM是一个用于在英特尔XPU上使用INT4/FP4/INT8/FP8运行LLM(大型语言模型)的库,具有非常低的延迟(适用于任何PyTorch模型)。

在英特尔CPU上安装BigDL-LLM

安装

bash 复制代码
pip install --pre --upgrade bigdl-llm[all]

运行模型

python 复制代码
#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

#run the optimized model on CPU
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
input_ids = tokenizer.encode(input_str, ...)
output_ids = model.generate(input_ids, ...)
output = tokenizer.batch_decode(output_ids)

在英特尔GPU上安装BigDL-LLM

安装

bash 复制代码
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu

运行模型

python 复制代码
#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

#run the optimized model on Intel GPU
model = model.to('xpu')

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
input_ids = tokenizer.encode(input_str, ...)
input_ids = input_ids.to('xpu')
output_ids = model.generate(input_ids, ...)
output = tokenizer.batch_decode(output_ids.cpu())

使用BigDL-LLM优化Baichuan2

CPU优化方案

使用conda安装

bash 复制代码
conda create -n llm python=3.9
conda activate llm

pip install bigdl-llm[all] # install bigdl-llm with 'all' option
pip install transformers_stream_generator

运行

python 复制代码
import torch

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Generate predicted tokens
with torch.inference_mode():
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    output = model.generate(input_ids, max_new_tokens=args.n_predict)
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)

GPU方案

使用conda安装

bash 复制代码
conda create -n llm python=3.9
conda activate llm
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install transformers_stream_generator

配置OneAPI环境变量

windows:

bash 复制代码
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

Linux:

bash 复制代码
source /opt/intel/oneapi/setvars.sh

为了在Arc上获得最佳性能,建议配置几个环境变量

bash 复制代码
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

运行

python 复制代码
import torch
import intel_extension_for_pytorch as ipex

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True)
model = model.to('xpu')

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
    
# Generate predicted tokens
with torch.inference_mode():
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    input_ids = input_ids.to('xpu')
    # ipex model needs a warmup, then inference time can be accurate
    output = model.generate(input_ids, max_new_tokens=args.n_predict)

    # start inference
    output = model.generate(input_ids, max_new_tokens=args.n_predict)
    torch.xpu.synchronize()
    output = output.cpu()
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)
相关推荐
程序员泠零澪回家种桔子2 小时前
RAG中的Embedding技术
人工智能·后端·ai·embedding
leikooo3 小时前
Spring AI 工具调用回调与流式前端展示的完整落地方案
java·spring·ai·ai编程
imbackneverdie3 小时前
如何通过读文献寻找科研思路?
人工智能·ai·自然语言处理·aigc·ai写作·ai读文献
海绵宝宝de派小星3 小时前
如何开始学习AI?学习路径建议
ai
一叶飘零_sweeeet4 小时前
大模型和机器学习
ai
图生生4 小时前
AI溶图技术+光影适配:实现产品场景图的高质量合成
人工智能·ai
布谷鸟科技cookoo5 小时前
布谷鸟科技走进小鹏汽车,解构远程驾驶全栈解决方案
人工智能·科技·ai·自动驾驶·边缘计算·远程驾驶
lynnlovemin5 小时前
SpringBoot+SSE构建AI实时流式对话系统:原理剖析与代码实战
人工智能·spring boot·后端·ai·sse
打工的小王6 小时前
Langchain4j(二)RAG知识库
java·后端·ai·语言模型
1张驰咨询17 小时前
从“实验室奇迹”到“工业艺术品”:我如何用六西格玛培训,为AI产品注入“确定性基因”?
人工智能·ai·六西格玛·六西格玛培训·六西格玛咨询·六西格玛培训公司·六西格玛咨询机构