llama.cpp 利用intel集成显卡xpu加速推理

用 llama.cpp 调用 Intel 的集成显卡 XPU 来提升推理效率.

驱动及依赖库

安装 Intel oneAPI Base Toolkit,确保显卡驱动支持 SYCL 和 oneAPI。

复制代码
#安装 dpcpp-cpp-rt、mkl-dpcpp、onednn 等库:
pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0

重新安装 llama.cpp

如果已经安装过llama.cpp, 则要增加--force-reinstall 重新安装 。另外 增加-DLLAMA_SYCL=ON, 以打开对intel 集成显卡的支持。

-DLLAMA_AVX: 为启用CPU的AVX指令集加速

-DLLAMA_SYCL=ON: intel 集成显卡的支持。

复制代码
$env:CMAKE_ARGS = "-DLLAMA_SYCL=ON -DLLAMA_AVX=on"
pip install --force-reinstall  --no-cache-dir llama-cpp-python   #如果安装失败,可更换版本,eg. 0.2.23
pip install --force-reinstall  --no-cache-dir llama-cpp-python==0.2.23

验证

配置n_gpu_layers > 0调用模型

复制代码
>>> from llama_cpp import Llama
# 初始化模型,指定模型路径和GPU层数
>>> llm = Llama(model_path="llama-2-7b-chat.Q4_K_M.gguf", n_ctx=2048,n_threads=8, n_gpu_layers=32)
>>> print(llm("who are you!", max_tokens=50)["choices"][0]["text"])  
>>> print(llm("请介绍一下人工智能的发展历史。", max_tokens=256)["choices"][0]["text"])  

运行时间对比cpu vx xpu

n_gpu_layers分别配置0和32, 来启动llama模型,请求相同的问题"解释一下llama_cpp", xpu方案在输出内容更长的情况下,耗时更短。178s vs 207s .

复制代码
用户:解释一下llama_cpp

llama_print_timings:        load time =   10713.40 ms
llama_print_timings:      sample time =     297.97 ms /   500 runs   (    0.60 ms per token,  1678.00 tokens per second)
llama_print_timings: prompt eval time =   10713.22 ms /    67 tokens (  159.90 ms per token,     6.25 tokens per second)
llama_print_timings:        eval time =  163259.65 ms /   499 runs   (  327.17 ms per token,     3.06 tokens per second)
llama_print_timings:       total time =  178246.50 ms

助手(耗时178.25s):
  Certainly! llama_cpp is a programming language that is designed to be easy to use and understand, while also being powerful enough to build complex applications. It is based on the C++ programming language, but has several features that make it more accessible to beginners and non-experts.
Here are some key features of llama_cpp:
1. Syntax: LLama_cpp has a simplified syntax compared to C++, which makes it easier to read and write code. For example, in LLama_cpp, you don't need to use parentheses for function calls, and you can omit the semicolon at the end of statements.
2. Type Inference: LLama_cpp has type inference, which means that you don't need to explicitly specify the types of variables or function arguments. The compiler can infer the types based on the context.
3. Functional Programming: LLama_cpp supports functional programming concepts such as higher-order functions, closures, and immutable data structures. This makes it easier to write pure functions that are easy to reason about and test.     
4. Object-Oriented Programming: LLama_cpp also supports object-oriented programming (OOP) concepts such as classes, objects, inheritance, and polymorphism.     
5. Concise Code: LLama_cpp aims to be concise and efficient, allowing you to write code that is easy to read and understand while still being compact and efficient.
6. Cross-Platform: LLama_cpp is designed to be cross-platform, meaning it can run on multiple operating systems without modification. This makes it easier to write code that can be used on different platforms.
7. Extensive Standard Library: LLama_cpp has an extensive standard library that includes support for common data structures and algorithms, as well as a range of other useful functions.
8. Supportive Community: LLama_cpp has a growing community of developers and users who are actively contributing to the language and its ecosystem. This means there are many resources available online, including tutorials, documentation, and forums where you can ask questions and get help.
Overall, llama_cpp is a language that is designed to be easy to learn and use, while still being powerful enough to build complex applications. Its simplified syntax and type inference make it accessible to beginners, while its support for 
相关推荐
数据科学作家2 小时前
学数据分析必囤!数据分析必看!清华社9本书覆盖Stata/SPSS/Python全阶段学习路径
人工智能·python·机器学习·数据分析·统计·stata·spss
HXQ_晴天4 小时前
CASToR 生成的文件进行转换
python
CV缝合救星4 小时前
【Arxiv 2025 预发行论文】重磅突破!STAR-DSSA 模块横空出世:显著性+拓扑双重加持,小目标、大场景统统拿下!
人工智能·深度学习·计算机视觉·目标跟踪·即插即用模块
java1234_小锋5 小时前
Scikit-learn Python机器学习 - 特征预处理 - 标准化 (Standardization):StandardScaler
python·机器学习·scikit-learn
Python×CATIA工业智造5 小时前
Python带状态生成器完全指南:从基础到高并发系统设计
python·pycharm
向qian看_-_5 小时前
Linux 使用pip报错(error: externally-managed-environment )解决方案
linux·python·pip
Nicole-----5 小时前
Python - Union联合类型注解
开发语言·python
TDengine (老段)6 小时前
从 ETL 到 Agentic AI:工业数据管理变革与 TDengine IDMP 的治理之道
数据库·数据仓库·人工智能·物联网·时序数据库·etl·tdengine
蓝桉8026 小时前
如何进行神经网络的模型训练(视频代码中的知识点记录)
人工智能·深度学习·神经网络
星期天要睡觉7 小时前
深度学习——数据增强(Data Augmentation)
人工智能·深度学习