llama.cpp 利用intel集成显卡xpu加速推理

用 llama.cpp 调用 Intel 的集成显卡 XPU 来提升推理效率.

驱动及依赖库

安装 Intel oneAPI Base Toolkit,确保显卡驱动支持 SYCL 和 oneAPI。

复制代码
#安装 dpcpp-cpp-rt、mkl-dpcpp、onednn 等库:
pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0

重新安装 llama.cpp

如果已经安装过llama.cpp, 则要增加--force-reinstall 重新安装 。另外 增加-DLLAMA_SYCL=ON, 以打开对intel 集成显卡的支持。

-DLLAMA_AVX: 为启用CPU的AVX指令集加速

-DLLAMA_SYCL=ON: intel 集成显卡的支持。

复制代码
$env:CMAKE_ARGS = "-DLLAMA_SYCL=ON -DLLAMA_AVX=on"
pip install --force-reinstall  --no-cache-dir llama-cpp-python   #如果安装失败,可更换版本,eg. 0.2.23
pip install --force-reinstall  --no-cache-dir llama-cpp-python==0.2.23

验证

配置n_gpu_layers > 0调用模型

复制代码
>>> from llama_cpp import Llama
# 初始化模型,指定模型路径和GPU层数
>>> llm = Llama(model_path="llama-2-7b-chat.Q4_K_M.gguf", n_ctx=2048,n_threads=8, n_gpu_layers=32)
>>> print(llm("who are you!", max_tokens=50)["choices"][0]["text"])  
>>> print(llm("请介绍一下人工智能的发展历史。", max_tokens=256)["choices"][0]["text"])  

运行时间对比cpu vx xpu

n_gpu_layers分别配置0和32, 来启动llama模型,请求相同的问题"解释一下llama_cpp", xpu方案在输出内容更长的情况下,耗时更短。178s vs 207s .

复制代码
用户:解释一下llama_cpp

llama_print_timings:        load time =   10713.40 ms
llama_print_timings:      sample time =     297.97 ms /   500 runs   (    0.60 ms per token,  1678.00 tokens per second)
llama_print_timings: prompt eval time =   10713.22 ms /    67 tokens (  159.90 ms per token,     6.25 tokens per second)
llama_print_timings:        eval time =  163259.65 ms /   499 runs   (  327.17 ms per token,     3.06 tokens per second)
llama_print_timings:       total time =  178246.50 ms

助手(耗时178.25s):
  Certainly! llama_cpp is a programming language that is designed to be easy to use and understand, while also being powerful enough to build complex applications. It is based on the C++ programming language, but has several features that make it more accessible to beginners and non-experts.
Here are some key features of llama_cpp:
1. Syntax: LLama_cpp has a simplified syntax compared to C++, which makes it easier to read and write code. For example, in LLama_cpp, you don't need to use parentheses for function calls, and you can omit the semicolon at the end of statements.
2. Type Inference: LLama_cpp has type inference, which means that you don't need to explicitly specify the types of variables or function arguments. The compiler can infer the types based on the context.
3. Functional Programming: LLama_cpp supports functional programming concepts such as higher-order functions, closures, and immutable data structures. This makes it easier to write pure functions that are easy to reason about and test.     
4. Object-Oriented Programming: LLama_cpp also supports object-oriented programming (OOP) concepts such as classes, objects, inheritance, and polymorphism.     
5. Concise Code: LLama_cpp aims to be concise and efficient, allowing you to write code that is easy to read and understand while still being compact and efficient.
6. Cross-Platform: LLama_cpp is designed to be cross-platform, meaning it can run on multiple operating systems without modification. This makes it easier to write code that can be used on different platforms.
7. Extensive Standard Library: LLama_cpp has an extensive standard library that includes support for common data structures and algorithms, as well as a range of other useful functions.
8. Supportive Community: LLama_cpp has a growing community of developers and users who are actively contributing to the language and its ecosystem. This means there are many resources available online, including tutorials, documentation, and forums where you can ask questions and get help.
Overall, llama_cpp is a language that is designed to be easy to learn and use, while still being powerful enough to build complex applications. Its simplified syntax and type inference make it accessible to beginners, while its support for 
相关推荐
星期天要睡觉1 小时前
自然语言处理(NLP)——自然语言处理原理、发展历程、核心技术
人工智能·自然语言处理
低音钢琴1 小时前
【人工智能系列:机器学习学习和进阶01】机器学习初学者指南:理解核心算法与应用
人工智能·算法·机器学习
飞翔的佩奇2 小时前
【完整源码+数据集+部署教程】【天线&水】舰船战舰检测与分类图像分割系统源码&数据集全套:改进yolo11-repvit
前端·python·yolo·计算机视觉·数据集·yolo11·舰船战舰检测与分类图像分割系统
大千AI助手2 小时前
Hoeffding树:数据流挖掘中的高效分类算法详解
人工智能·机器学习·分类·数据挖掘·流数据··hoeffding树
新知图书3 小时前
大模型微调定义与分类
人工智能·大模型应用开发·大模型应用
山烛3 小时前
一文读懂YOLOv4:目标检测领域的技术融合与性能突破
人工智能·yolo·目标检测·计算机视觉·yolov4
大千AI助手3 小时前
独热编码:分类数据处理的基石技术
人工智能·机器学习·分类·数据挖掘·特征工程·one-hot·独热编码
钱彬 (Qian Bin)3 小时前
项目实践4—全球证件智能识别系统(Qt客户端开发+FastAPI后端人工智能服务开发)
人工智能·qt·fastapi
钱彬 (Qian Bin)3 小时前
项目实践3—全球证件智能识别系统(Qt客户端开发+FastAPI后端人工智能服务开发)
人工智能·qt·fastapi
Microsoft Word3 小时前
向量数据库与RAG
数据库·人工智能·向量数据库·rag