记录一些安装llama并使用cuda遇到的坑

  1. Failed building wheel for llama-cpp-python
    主要报错信息是:
python 复制代码
CMake configuration failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects
╰─> llama-cpp-python

Microsoft Visual C++ Build Tools下载vs_buildtools.exe,安装时选择C++ build tools(C++桌面开发),勾选以下组件:

复制代码
MSVC v143 - VS 2022 C++ x64/x86 build tools
Windows 10 SDK
C++ CMake tools for Windows

新增环境变量Path形如D:\Softwares\Coding\Microsoft Visual Studio\18\BuildTools\VC\Tools\MSVC\14.50.35717\bin\Hostx64\x64

重启电脑安装:

bash 复制代码
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

成功:

python 复制代码
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... done
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.3.16-cp314-cp314-win_amd64.whl size=6949886 sha256=59c056b0bac981ed372fe67362b1bbb12b16197a527a493ef10542c095cdde94
  Stored in directory: c:\users\administrator\appdata\local\pip\cache\wheels\2b\c2\dc\f5dfca72f8099585613317227bf9b9d2884789802d70d1a79e
Successfully built llama-cpp-python

参考资料:https://blog.csdn.net/nk1610099/article/details/141504899

  1. llama不使用GPU
    设置Llama(verbose=True)的时候可以查看运行在哪里,比如load_tensors: layer 0 assigned to device CPU, is_swa = 0或者llama_kv_cache_unified: layer 0: dev = CPU这样的是在CPU上运行的。
    如果执行pip install llama-cpp-python报错:
python 复制代码
No CUDA toolset found.

可以把

复制代码
CUDA 13.1.props
CUDA 13.1.targets
CUDA 13.1.Version.props
CUDA 13.1.xml
Nvda.Build.CudaTasks.v13.1.dll

复制到类似于D:\Softwares\Coding\Microsoft Visual Studio\18\BuildTools\MSBuild\Microsoft\VC\v180\BuildCustomizations\的路径下。

如果遇到报错:

复制代码
fatal error C1189: #error:  -- unsupported Microsoft Visual Studio version! Only the versions between 2019 and 2022 (inclusive) are supported! 

执行:

bash 复制代码
pip uninstall llama-cpp-python -y
$env:CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_ARCHITECTURES=100 -DCMAKE_CUDA_FLAGS='-allow-unsupported-compiler'"
$env:FORCE_CMAKE="1"
pip install llama-cpp-python --no-cache-dir

来安装。

经过大概半个小时后,输出:

python 复制代码
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... done
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.3.16-cp314-cp314-win_amd64.whl size=75756635 sha256=3c101ad48d3b16daa2aec9c4a74fda828830912dd0c88918363c098ea5e1b5a5
  Stored in directory: C:\Users\Administrator\AppData\Local\Temp\pip-ephem-wheel-cache-80fslz66\wheels\2b\c2\dc\f5dfca72f8099585613317227bf9b9d2884789802d70d1a79e
Successfully built llama-cpp-python
Installing collected packages: llama-cpp-python
Successfully installed llama-cpp-python-0.3.16

在后面运行代码的时候有输出:

bash 复制代码
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5080 Laptop GPU, compute capability 12.0, VMM: yes

说明成功使用了GPU。性能从12.38 tokens per second上升到70.64 tokens per second。

相关推荐
摸鱼仙人~13 小时前
拆解 Llama 3.1 8B:从模型结构看懂大语言模型的核心设计
人工智能·语言模型·llama
python百炼成钢13 小时前
16_RK3588 Llama-3-8B模型部署
linux·服务器·人工智能·llama
code_pgf14 小时前
Jetson Orin NX 16G设备上配置AI服务自动启动的方案,包括Ollama、llama-server和OpenClaw Gateway三个组件
数据库·人工智能·安全·gateway·边缘计算·llama
serve the people15 小时前
LLaMA-Factory微调数据的清洗与指令构造方法
java·服务器·llama
code_pgf16 小时前
Jetson 上 OpenClaw + Ollama + llama.cpp 的联动配置模板部署大模型
服务器·数据库·人工智能·llama
code_pgf1 天前
Jetson Orin NX 16G部署llama.cpp框架(5090微调模型)
边缘计算·llama
Reisentyan2 天前
本地部署大模型过程中遇到的问题与处理过程
llama
忧郁的橙子.3 天前
07-大模型微调-LLama Factor微调Qwen -- 局部微调/训练医疗问答模型
llama·llama factor·微调qwen
南宫乘风5 天前
LLaMA-Factory 给 Qwen1.5 做 LoRA 微调 实战
人工智能·深度学习·llama
华农DrLai5 天前
什么是自动Prompt优化?为什么需要算法来寻找最佳提示词?
人工智能·算法·llm·nlp·prompt·llama