安装llama-cpp-python踩坑记

1. 安装llama-cpp-python报错

python 复制代码

$ CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python -i https://pypi.tuna.tsinghua.edu.cn/simple
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting llama-cpp-python
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/de/6d/4a20e676bdf7d9d3523be3a081bf327af958f9bdfe2a564f5cf485faeaec/llama_cpp_python-0.3.9.tar.gz (67.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.9/67.9 MB 5.3 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: typing-extensions>=4.5.0 in /home/wuwenliang/anaconda3/envs/llmtuner/lib/python3.10/site-packages (from llama-cpp-python) (4.13.2)
Requirement already satisfied: numpy>=1.20.0 in /home/wuwenliang/anaconda3/envs/llmtuner/lib/python3.10/site-packages (from llama-cpp-python) (1.26.4)
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/3f/27/4570e78fc0bf5ea0ca45eb1de3818a23787af9b390c0b0a0033a1b8236f9/diskcache-5.6.3-py3-none-any.whl (45 kB)
Requirement already satisfied: jinja2>=2.11.3 in /home/wuwenliang/anaconda3/envs/llmtuner/lib/python3.10/site-packages (from llama-cpp-python) (3.1.6)
Requirement already satisfied: MarkupSafe>=2.0 in /home/wuwenliang/anaconda3/envs/llmtuner/lib/python3.10/site-packages (from jinja2>=2.11.3->llama-cpp-python) (3.0.2)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [29 lines of output]
      *** scikit-build-core 0.11.5 using CMake 3.22.1 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmp01d6kko6/build/CMakeInit.txt
      -- The C compiler identification is GNU 11.2.0
      -- The CXX compiler identification is GNU 11.2.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/gcc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/g++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found Git: /usr/bin/git (found version "2.34.1")
      CMake Error at vendor/llama.cpp/CMakeLists.txt:108 (message):
        LLAMA_CUBLAS is deprecated and will be removed in the future.
      
        Use GGML_CUDA instead
      
      Call Stack (most recent call first):
        vendor/llama.cpp/CMakeLists.txt:113 (llama_option_depr)
      
      
      -- Configuring incomplete, errors occurred!
      See also "/tmp/tmp01d6kko6/build/CMakeFiles/CMakeOutput.log".
      
      *** CMake configuration failed
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama-cpp-python)

安装报错，分析如下：

这个错误是因为 LLAMA_CUBLAS 选项已经被弃用，建议使用 GGML_CUDA 替代。你需要修改安装命令中的 CMake 参数。

完整安装命令（包含更多可能的依赖）：

python 复制代码

CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc" pip install llam

安装仍旧失败：

python 复制代码

  collect2: error: ld returned 1 exit status
  ninja: build stopped: subcommand failed.

  *** CMake build failed
  error: subprocess-exited-with-error
  
  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /llmtuner/bin/python /llmtuner/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp2wtlptdn
  cwd: /tmp/pip-install-jm38bxck/llama-cpp-python_69bdce92ceff4d4db8aec6e07d4f05e3
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama-cpp-python)

2. 采用源码安装

python 复制代码

git clone --recursive https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python
CMAKE_ARGS="-DGGML_CUDA=on" pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple

仍旧失败

python 复制代码

$ CMAKE_ARGS="-DGGML_CUDA=on" pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing /home/software/llama-cpp-python
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: typing-extensions>=4.5.0 in /llmtuner/lib/python3.10/site-packages (from llama_cpp_python==0.3.9) (4.13.2)
Requirement already satisfied: numpy>=1.20.0 in /llmtuner/lib/python3.10/site-packages (from llama_cpp_python==0.3.9) (1.26.4)
Collecting diskcache>=5.6.1 (from llama_cpp_python==0.3.9)
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/3f/27/4570e78fc0bf5ea0ca45eb1de3818a23787af9b390c0b0a0033a1b8236f9/diskcache-5.6.3-py3-none-any.whl (45 kB)
Requirement already satisfied: jinja2>=2.11.3 in /llmtuner/lib/python3.10/site-packages (from llama_cpp_python==0.3.9) (3.1.6)
Requirement already satisfied: MarkupSafe>=2.0 in /llmtuner/lib/python3.10/site-packages (from jinja2>=2.11.3->llama_cpp_python==0.3.9) (3.0.2)
Building wheels for collected packages: llama_cpp_python
  Building wheel for llama_cpp_python (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for llama_cpp_python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [64 lines of output]
      *** scikit-build-core 0.11.5 using CMake 3.22.1 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmpwhq61gdi/build/CMakeInit.txt
      -- The C compiler identification is GNU 11.2.0
      -- The CXX compiler identification is GNU 11.2.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/gcc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/g++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found Git: /usr/bin/git (found version "2.34.1")
      -- Looking for pthread.h
      -- Looking for pthread.h - found
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
      -- CMAKE_SYSTEM_PROCESSOR: x86_64
      -- Including CPU backend
      -- Found OpenMP_C: -fopenmp (found version "4.5")
      -- Found OpenMP_CXX: -fopenmp (found version "4.5")
      -- Found OpenMP: TRUE (found version "4.5")
      -- x86 detected
      -- Adding CPU backend variant ggml-cpu: -march=native
      -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.4.131")
      -- CUDA Toolkit found
      -- Using CUDA architectures: 50-virtual;61-virtual;70-virtual;75-virtual;80-virtual;86-real;89-real
      CMake Error at /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:726 (message):
        Compiling the CUDA compiler identification source file
        "CMakeCUDACompilerId.cu" failed.
      
        Compiler: CMAKE_CUDA_COMPILER-NOTFOUND
      
        Build flags:
      
        Id flags: -v
        The output was:
      
        No such file or directory
      
      Call Stack (most recent call first):
        /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:6 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
        /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:48 (__determine_compiler_id_test)
        /usr/share/cmake-3.22/Modules/CMakeDetermineCUDACompiler.cmake:298 (CMAKE_DETERMINE_COMPILER_ID)
        vendor/llama.cpp/ggml/src/ggml-cuda/CMakeLists.txt:43 (enable_language)
      
      
      -- Configuring incomplete, errors occurred!
      See also "/tmp/tmpwhq61gdi/build/CMakeFiles/CMakeOutput.log".
      See also "/tmp/tmpwhq61gdi/build/CMakeFiles/CMakeError.log".
      
      *** CMake configuration failed
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama_cpp_python
Failed to build llama_cpp_python
ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama_cpp_python)

3. 确保 CUDA 环境正确配置

python 复制代码

运行以下命令检查 CUDA 是否安装正确：
nvcc --version
如果报错 
nvcc: command not found
，说明 CUDA 环境未正确配置。需要：

安装 CUDA Toolkit（推荐 CUDA 12.x 或 11.x）

确保 nvcc 在 PATH 中：export PATH=/usr/local/cuda/bin:$PATH

python 复制代码

$ nvcc -version
bash: nvcc: command not found

python 复制代码

1. 确认是否安装了 CUDA Toolkit

运行以下命令检查 CUDA 是否安装：
ls /usr/local/cuda

如果目录不存在，说明 CUDA Toolkit 未安装，需要先安装它。

python 复制代码

$ ls /usr/local/cuda
bin  compute-sanitizer  DOCS  EULA.txt  extras  gds  gds-12.4  include  lib64  libnvvp  nsight-compute-2024.1.1  nsightee_plugins  nsight-systems-2023.4.4  nvml  nvvm  README  share  src  targets  tools  version.json

目录结构来看，CUDA Toolkit 已经安装（版本可能是 12.4），但 nvcc 仍然不可用，可能是因为： nvcc 路径未添加到 PATH CUDA 环境变量未正确配置安装过程中部分组件缺失

解决方案 1. 手动添加 nvcc 到 PATH

export PATH=/usr/local/cuda/bin:$PATH

然后验证： nvcc --version

python 复制代码

将以下内容添加到 
~/.bashrc
 或 
~/.zshrc
：

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

然后生效：
source ~/.bashrc

python 复制代码

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

4. 重新命令行安装 llama-cpp-python

确保 nvcc 可用后，

运行：
python 复制代码
CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc" pip install llama-cpp-python

仍然报错

python 复制代码

FAILED: vendor/llama.cpp/tools/mtmd/llama-llava-clip-quantize-cli
      : && /usr/bin/g++  -pthread -B /home/anaconda3/envs/llmtuner/compiler_compat -O3 -DNDEBUG  vendor/llama.cpp/tools/mtmd/CMakeFiles/llava.dir/llava.cpp.o vendor/llama.cpp/tools/mtmd/CMakeFiles/llava.dir/clip.cpp.o vendor/llama.cpp/tools/mtmd/CMakeFiles/llama-llava-clip-quantize-cli.dir/clip-quantize-cli.cpp.o -o vendor/llama.cpp/tools/mtmd/llama-llava-clip-quantize-cli  -Wl,-rpath,/tmp/tmptjbwss02/build/bin:  vendor/llama.cpp/common/libcommon.a  bin/libllama.so  bin/libggml.so  bin/libggml-cpu.so  bin/libggml-cuda.so  bin/libggml-base.so && :
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: warning: libgomp.so.1, needed by bin/libggml-cpu.so, not found (try using -rpath or -rpath-link)
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: warning: libcuda.so.1, needed by bin/libggml-cuda.so, not found (try using -rpath or -rpath-link)
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: warning: libdl.so.2, needed by /usr/local/cuda/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link)
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: warning: libpthread.so.0, needed by /usr/local/cuda/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link)
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: warning: librt.so.1, needed by /usr/local/cuda/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link)
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemCreate'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cpu.so: undefined reference to `GOMP_barrier@GOMP_1.0'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemAddressReserve'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemUnmap'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cpu.so: undefined reference to `GOMP_parallel@GOMP_4.0'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemSetAccess'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuDeviceGet'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cpu.so: undefined reference to `omp_get_thread_num@OMP_1.0'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemAddressFree'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuGetErrorString'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cpu.so: undefined reference to `GOMP_single_start@GOMP_1.0'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuDeviceGetAttribute'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemMap'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemRelease'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cpu.so: undefined reference to `omp_get_num_threads@OMP_1.0'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemGetAllocationGranularity'
      collect2: error: ld returned 1 exit status
      [165/165] : && /usr/bin/g++  -pthread -B /home/anaconda3/envs/llmtuner/compiler_compat -O3 -DNDEBUG  vendor/llama.cpp/tools/mtmd/CMakeFiles/mtmd.dir/mtmd.cpp.o vendor/llama.cpp/tools/mtmd/CMakeFiles/mtmd.dir/clip.cpp.o vendor/llama.cpp/tools/mtmd/CMakeFiles/llama-mtmd-cli.dir/mtmd-cli.cpp.o -o vendor/llama.cpp/tools/mtmd/llama-mtmd-cli  -Wl,-rpath,/tmp/tmptjbwss02/build/bin:  vendor/llama.cpp/common/libcommon.a  bin/libllama.so  bin/libggml.so  bin/libggml-cpu.so  bin/libggml-cuda.so  bin/libggml-base.so && :
      FAILED: vendor/llama.cpp/tools/mtmd/llama-mtmd-cli
      : && /usr/bin/g++  -pthread -B /home/anaconda3/envs/llmtuner/compiler_compat -O3 -DNDEBUG  vendor/llama.cpp/tools/mtmd/CMakeFiles/mtmd.dir/mtmd.cpp.o vendor/llama.cpp/tools/mtmd/CMakeFiles/mtmd.dir/clip.cpp.o vendor/llama.cpp/tools/mtmd/CMakeFiles/llama-mtmd-cli.dir/mtmd-cli.cpp.o -o vendor/llama.cpp/tools/mtmd/llama-mtmd-cli  -Wl,-rpath,/tmp/tmptjbwss02/build/bin:  vendor/llama.cpp/common/libcommon.a  bin/libllama.so  bin/libggml.so  bin/libggml-cpu.so  bin/libggml-cuda.so  bin/libggml-base.so && :
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: warning: libgomp.so.1, needed by bin/libggml-cpu.so, not found (try using -rpath or -rpath-link)
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: warning: libcuda.so.1, needed by bin/libggml-cuda.so, not found (try using -rpath or -rpath-link)
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: warning: libdl.so.2, needed by /usr/local/cuda/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link)
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: warning: libpthread.so.0, needed by /usr/local/cuda/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link)
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: warning: librt.so.1, needed by /usr/local/cuda/lib64/libcudart.so.12, not found (try using -rpath or -rpath-link)
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemCreate'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cpu.so: undefined reference to `GOMP_barrier@GOMP_1.0'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemAddressReserve'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemUnmap'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cpu.so: undefined reference to `GOMP_parallel@GOMP_4.0'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemSetAccess'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuDeviceGet'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cpu.so: undefined reference to `omp_get_thread_num@OMP_1.0'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemAddressFree'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuGetErrorString'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cpu.so: undefined reference to `GOMP_single_start@GOMP_1.0'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuDeviceGetAttribute'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemMap'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemRelease'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cpu.so: undefined reference to `omp_get_num_threads@OMP_1.0'
      /home/anaconda3/envs/llmtuner/compiler_compat/ld: bin/libggml-cuda.so: undefined reference to `cuMemGetAllocationGranularity'
      collect2: error: ld returned 1 exit status
      ninja: build stopped: subcommand failed.
      
      *** CMake build failed
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama-cpp-python)

从错误日志来看，主要问题是链接阶段找不到关键的 CUDA 和 OpenMP 动态库（ libcuda.so.1 、 libgomp.so.1 等），同时存在符号未定义的问题（如 cuMemCreate ）。以下是完整的解决方案：

5. 修复cuda so库

python 复制代码

1. 修复系统库路径（关键步骤）

添加 CUDA 和系统库路径到环境变量

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

永久生效：
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

验证库是否存在

ls /usr/lib/x86_64-linux-gnu/libgomp.so.1      # OpenMP 库
ls /usr/local/cuda/lib64/libcudart.so.12       # CUDA Runtime
ls /usr/lib/x86_64-linux-gnu/libcuda.so.1      # NVIDIA 驱动库

python 复制代码

$ echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH' >> ~/.bashrc
$ source ~/.bashrc                                                                                                                                                                                                                                                     
$ ls /usr/lib/x86_64-linux-gnu/libgomp.so.1 
/usr/lib/x86_64-linux-gnu/libgomp.so.1
$ ls /usr/local/cuda/lib64/libcudart.so.12
/usr/local/cuda/lib64/libcudart.so.12
$ ls /usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so.1

6. 重新安装并指定完整 CMake 参数

python 复制代码

CMAKE_ARGS="-DGGML_CUDA=ON \
            -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \
            -DCMAKE_CUDA_ARCHITECTURES=80 \
            -DLLAMA_CUDA_FORCE_DMMV=ON" \
  pip install llama-cpp-python \
    --force-reinstall \
    --no-cache-dir \
    --verbose \
    -i https://pypi.tuna.tsinghua.edu.cn/simple

python 复制代码

最终可以看到：
Successfully installed MarkupSafe-3.0.2 diskcache-5.6.3 jinja2-3.1.6 llama-cpp-python-0.3.9 numpy-2.2.6 typing-extensions-4.14.0