使用BigDL-LLM优化大语言模型

BigDL-LLM是一个用于在英特尔XPU上使用INT4/FP4/INT8/FP8运行LLM(大型语言模型)的库,具有非常低的延迟(适用于任何PyTorch模型)。

在英特尔CPU上安装BigDL-LLM

安装

bash 复制代码
pip install --pre --upgrade bigdl-llm[all]

运行模型

python 复制代码
#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

#run the optimized model on CPU
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
input_ids = tokenizer.encode(input_str, ...)
output_ids = model.generate(input_ids, ...)
output = tokenizer.batch_decode(output_ids)

在英特尔GPU上安装BigDL-LLM

安装

bash 复制代码
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu

运行模型

python 复制代码
#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

#run the optimized model on Intel GPU
model = model.to('xpu')

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
input_ids = tokenizer.encode(input_str, ...)
input_ids = input_ids.to('xpu')
output_ids = model.generate(input_ids, ...)
output = tokenizer.batch_decode(output_ids.cpu())

使用BigDL-LLM优化Baichuan2

CPU优化方案

使用conda安装

bash 复制代码
conda create -n llm python=3.9
conda activate llm

pip install bigdl-llm[all] # install bigdl-llm with 'all' option
pip install transformers_stream_generator

运行

python 复制代码
import torch

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Generate predicted tokens
with torch.inference_mode():
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    output = model.generate(input_ids, max_new_tokens=args.n_predict)
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)

GPU方案

使用conda安装

bash 复制代码
conda create -n llm python=3.9
conda activate llm
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install transformers_stream_generator

配置OneAPI环境变量

windows:

bash 复制代码
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

Linux:

bash 复制代码
source /opt/intel/oneapi/setvars.sh

为了在Arc上获得最佳性能,建议配置几个环境变量

bash 复制代码
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

运行

python 复制代码
import torch
import intel_extension_for_pytorch as ipex

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True)
model = model.to('xpu')

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
    
# Generate predicted tokens
with torch.inference_mode():
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    input_ids = input_ids.to('xpu')
    # ipex model needs a warmup, then inference time can be accurate
    output = model.generate(input_ids, max_new_tokens=args.n_predict)

    # start inference
    output = model.generate(input_ids, max_new_tokens=args.n_predict)
    torch.xpu.synchronize()
    output = output.cpu()
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)
相关推荐
海豚调度1 小时前
Linux 基金会报告解读:开源 AI 重塑经济格局,有人失业,有人涨薪!
大数据·人工智能·ai·开源
令狐少侠20111 小时前
ai之RAG本地知识库--基于OCR和文本解析器的新一代RAG引擎:RAGFlow 认识和源码剖析
人工智能·ai
小屁妞6 小时前
Spring AI Alibaba智能测试用例生成
ai·测试用例生成·ai生成测试用例
tonngw8 小时前
DeepSeek 帮助自己的工作
ai
张彦峰ZYF21 小时前
从检索到生成:RAG 如何重构大模型的知识边界?
人工智能·ai·aigc
难受啊马飞2.01 天前
如何判断 AI 将优先自动化哪些任务?
运维·人工智能·ai·语言模型·程序员·大模型·大模型学习
运器1231 天前
【一起来学AI大模型】算法核心:数组/哈希表/树/排序/动态规划(LeetCode精练)
开发语言·人工智能·python·算法·ai·散列表·ai编程
虾条_花吹雪1 天前
2、Connecting to Kafka
分布式·ai·kafka
DeepSeek大模型官方教程1 天前
NLP之文本纠错开源大模型:兼看语音大模型总结
大数据·人工智能·ai·自然语言处理·大模型·产品经理·大模型学习
程序员鱼皮1 天前
Cursor 1.2重磅更新,这个痛点终于被解决了!
ai·程序员·编程·agent·软件开发