使用BigDL-LLM优化大语言模型

BigDL-LLM是一个用于在英特尔XPU上使用INT4/FP4/INT8/FP8运行LLM(大型语言模型)的库,具有非常低的延迟(适用于任何PyTorch模型)。

在英特尔CPU上安装BigDL-LLM

安装

bash 复制代码
pip install --pre --upgrade bigdl-llm[all]

运行模型

python 复制代码
#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

#run the optimized model on CPU
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
input_ids = tokenizer.encode(input_str, ...)
output_ids = model.generate(input_ids, ...)
output = tokenizer.batch_decode(output_ids)

在英特尔GPU上安装BigDL-LLM

安装

bash 复制代码
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu

运行模型

python 复制代码
#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)

#run the optimized model on Intel GPU
model = model.to('xpu')

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
input_ids = tokenizer.encode(input_str, ...)
input_ids = input_ids.to('xpu')
output_ids = model.generate(input_ids, ...)
output = tokenizer.batch_decode(output_ids.cpu())

使用BigDL-LLM优化Baichuan2

CPU优化方案

使用conda安装

bash 复制代码
conda create -n llm python=3.9
conda activate llm

pip install bigdl-llm[all] # install bigdl-llm with 'all' option
pip install transformers_stream_generator

运行

python 复制代码
import torch

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Generate predicted tokens
with torch.inference_mode():
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    output = model.generate(input_ids, max_new_tokens=args.n_predict)
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)

GPU方案

使用conda安装

bash 复制代码
conda create -n llm python=3.9
conda activate llm
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install transformers_stream_generator

配置OneAPI环境变量

windows:

bash 复制代码
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

Linux:

bash 复制代码
source /opt/intel/oneapi/setvars.sh

为了在Arc上获得最佳性能,建议配置几个环境变量

bash 复制代码
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

运行

python 复制代码
import torch
import intel_extension_for_pytorch as ipex

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, use_cache=True)
model = model.to('xpu')

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
    
# Generate predicted tokens
with torch.inference_mode():
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    input_ids = input_ids.to('xpu')
    # ipex model needs a warmup, then inference time can be accurate
    output = model.generate(input_ids, max_new_tokens=args.n_predict)

    # start inference
    output = model.generate(input_ids, max_new_tokens=args.n_predict)
    torch.xpu.synchronize()
    output = output.cpu()
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)
相关推荐
带刺的坐椅14 小时前
迈向 MCP 集群化:Solon AI (支持 Java8+)在解决 MCP 服务可扩展性上的探索与实践
java·ai·llm·solon·mcp
zhengfei61114 小时前
AI渗透工具——基于大型模型的自主渗透测试智能体鸾鸟(LuaN1ao)
安全·ai·开源
junlaii15 小时前
Windows 安装 claude code 教程
windows·ai
Elastic 中国社区官方博客16 小时前
Elasticsearch:圣诞晚餐 BBQ - 图像识别
大数据·数据库·elasticsearch·搜索引擎·ai·全文检索
CoderJia程序员甲16 小时前
GitHub 热榜项目 - 日榜(2025-12-24)
ai·开源·llm·github
罗政1 天前
DeepSeek、通义千问、智谱、Kimi 的 API Key 获取指南
ai
zhuzihuaile1 天前
Langchain-Chatchat + Ollama + QWen3 + 搭建知识库 + AI-Win
人工智能·python·ai·langchain
空白诗1 天前
昇腾 NPU 落地 Llama3-8B:模型获取到数学解题推理的全流程实战
人工智能·ai·语言模型·npu
CoderJia程序员甲1 天前
GitHub 热榜项目 - 日榜(2025-12-25)
git·ai·开源·llm·github
FIT2CLOUD飞致云1 天前
学习笔记丨MaxKB WPS合同审核助手的设计与实现
ai·开源·工作流·智能体·maxkb