llama.cpp部署(windows)

一、下载源码和模型

下载源码和模型
复制代码
# 下载源码
git clone https://github.com/ggerganov/llama.cpp.git

# 下载llama-7b模型
git clone https://www.modelscope.cn/skyline2006/llama-7b.git
查看cmake版本:
复制代码
D:\pyworkspace\llama_cpp\llama.cpp\build>cmake --version
cmake version 3.22.0-rc2

CMake suite maintained and supported by Kitware (kitware.com/cmake).

二、开始build

复制代码
# 进入llama.cpp目录
mkdir build
cd build
cmake ..

build信息

复制代码
D:\pyworkspace\llama_cpp\llama.cpp\build>cmake ..
-- Building for: Visual Studio 16 2019
-- Selecting Windows SDK version 10.0.18362.0 to target Windows 10.0.22631.
-- The C compiler identification is MSVC 19.29.30137.0
-- The CXX compiler identification is MSVC 19.29.30137.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: D:/Git/Git/cmd/git.exe (found version "2.29.2.windows.2")
-- Looking for pthread.h
-- Looking for pthread.h - not found
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: AMD64
-- CMAKE_GENERATOR_PLATFORM:
-- x86 detected
-- Performing Test HAS_AVX_1
-- Performing Test HAS_AVX_1 - Success
-- Performing Test HAS_AVX2_1
-- Performing Test HAS_AVX2_1 - Success
-- Performing Test HAS_FMA_1
-- Performing Test HAS_FMA_1 - Success
-- Performing Test HAS_AVX512_1
-- Performing Test HAS_AVX512_1 - Failed
-- Performing Test HAS_AVX512_2
-- Performing Test HAS_AVX512_2 - Failed
-- Configuring done
-- Generating done
-- Build files have been written to: D:/pyworkspace/llama_cpp/llama.cpp/build

本地使用Realease会出现报错,修改为Debug进行build,这里会使用到visual studio进行build

复制代码
cmake --build . --config Debug

build信息

复制代码
D:\pyworkspace\llama_cpp\llama.cpp\build>cmake --build . --config Debug
用于 .NET Framework 的 Microsoft (R) 生成引擎版本 16.11.2+f32259642
版权所有(C) Microsoft Corporation。保留所有权利。

  Checking Build System
  Generating build details from Git
  -- Found Git: D:/Git/Git/cmd/git.exe (found version "2.29.2.windows.2")
  Building Custom Rule D:/pyworkspace/llama_cpp/llama.cpp/common/CMakeLists.txt
  build-info.cpp
  build_info.vcxproj -> D:\pyworkspace\llama_cpp\llama.cpp\build\common\build_info.dir\Debug\build_info.lib
  Building Custom Rule D:/pyworkspace/llama_cpp/llama.cpp/CMakeLists.txt
  ggml.c

在我本地D:\pyworkspace\llama_cpp\llama.cpp\build\bin\Debug目录下面产生了quantize.exe和main.exe等

三、量化和推理

安装相关python依赖

复制代码
python -m pip install -r requirements.txt

将下载好的llama-7b模型放入models目录下,并执行命令,会在llama-7b目录下面产生ggml-model-f16.gguf文件

复制代码
python convert.py models/llama-7b/

对产生的文件进行量化

复制代码
D:\pyworkspace\llama_cpp\llama.cpp\build\bin\Debug\quantize.exe ./models/llama-7b/ggml-model-f16.gguf ./models/llama-7b/ggml-model-q4_0.gguf q4_0

进行推理

复制代码
D:\pyworkspace\llama_cpp\llama.cpp\build\bin\Debug\main.exe -m ./models/llama-7b/ggml-model-q4_0.gguf -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt
相关推荐
c7692 小时前
【文献笔记】Automatic Chain of Thought Prompting in Large Language Models
人工智能·笔记·语言模型·论文笔记
DeepSeek-大模型系统教程8 小时前
推荐 7 个本周 yyds 的 GitHub 项目。
人工智能·ai·语言模型·大模型·github·ai大模型·大模型学习
try2find13 小时前
安装llama-cpp-python踩坑记
开发语言·python·llama
静心问道15 小时前
STEP-BACK PROMPTING:退一步:通过抽象在大型语言模型中唤起推理能力
人工智能·语言模型·大模型
西西弗Sisyphus19 小时前
LLaMA-Factory 单卡后训练微调Qwen3完整脚本
微调·llama·llama-factory·后训练
顾道长生'19 小时前
(Arxiv-2024)自回归模型优于扩散:Llama用于可扩展的图像生成
计算机视觉·数据挖掘·llama·自回归模型·多模态生成与理解
MO2T19 小时前
使用 Flask 构建基于 Dify 的企业资金投向与客户分类评估系统
后端·python·语言模型·flask
静心问道21 小时前
APE:大语言模型具有人类水平的提示工程能力
人工智能·算法·语言模型·大模型
香宝的最强后援XD1 天前
Cursor无限邮箱续费方法
语言模型·chatgpt·文心一言
静心问道1 天前
SELF-INSTRUCT:使用自生成指令对齐语言模型
人工智能·语言模型·大模型