auto-gptq安装以及不适配软硬件环境可能出现的问题及解决方式

1、auto-gptq是什么？

Auto-GPTQ 是一种专注于 量化深度学习模型 的工具库。它的主要目标是通过量化技术（Quantization）将大型语言模型（LLM）等深度学习模型的大小和计算复杂度显著减少，从而提高推理效率，同时尽可能保持模型的性能。

2、auto-gptq安装

在Linux和Windows上，AutoGPTQ可以通过预先构建的轮子为特定的PyTorch版本安装:

AutoGPTQ version	CUDA/ROCm version	Installation	Built against PyTorch
latest (0.7.1)	CUDA 11.8	`pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/`	2.2.1+cu118
latest (0.7.1)	CUDA 12.1	`pip install auto-gptq`	2.2.1+cu121
latest (0.7.1)	ROCm 5.7	`pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/rocm571/`	2.2.1+rocm5.7
0.7.0	CUDA 11.8	`pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/`	2.2.0+cu118
0.7.0	CUDA 12.1	`pip install auto-gptq`	2.2.0+cu121
0.7.0	ROCm 5.7	`pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/rocm571/`	2.2.0+rocm5.7
0.6.0	CUDA 11.8	`pip install auto-gptq==0.6.0 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/`	2.1.1+cu118
0.6.0	CUDA 12.1	`pip install auto-gptq==0.6.0`	2.1.1+cu121
0.6.0	ROCm 5.6	`pip install auto-gptq==0.6.0 --extra-index-url https://huggingface.github.io/autogptq-index/whl/rocm561/`	2.1.1+rocm5.6
0.5.1	CUDA 11.8	`pip install auto-gptq==0.5.1 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/`	2.1.0+cu118
0.5.1	CUDA 12.1	`pip install auto-gptq==0.5.1`	2.1.0+cu121
0.5.1	ROCm 5.6	`pip install auto-gptq==0.5.1 --extra-index-url https://huggingface.github.io/autogptq-index/whl/rocm561/`	2.1.0+rocm5.6

AutoGPTQ is not available on macOS.
注意：安装的auto-gptq版本必须与CUDA和pytorch版本都适配，安装完之后推理速度很慢可能是需要从源码安装

3、auto-gptq不正确安装可能会出现的问题

（1）爆出：`CUDA extension not installed.`

这个问题我一直以为是CUDA和pytorch没配置好，或者不适配硬件，甚至以为是没有安装cudnn的原因，但最后发现原来是安装的auto-gptq不适配当下环境。

注意按照上面的方法安装auto-gptq仍然可能报错或者不适配，此时应该从源码安装，可以参考教程AutoGPTQ/README_zh.md at main · AutoGPTQ/AutoGPTQ，或者解决 GPTQ 模型导入后推理生成 Tokens 速度很慢的问题（从源码重新安装 Auto-GPTQ）_auto gptq 源码构建非cuda版本-CSDN博客

（以下摘自官方文档 ）

克隆源码:

git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ

然后，从项目目录安装:

pip install .

正如在快速安装一节，你可以使用 BUILD_CUDA_EXT=0 来取消构建 cuda 拓展。

如果你想要使用 triton 加速且其能够被你的操作系统所支持，请使用 .[triton]。

对应 AMD GPUs，为了从源码安装以支持 RoCm，请设置 ROCM_VERSION 环境变量。同时通过设置
PYTORCH_ROCM_ARCH

(reference)

可提升编译速度，例如：对于 MI200 系列设备，该变量可设为 gfx90a。例子：

ROCM_VERSION=5.6 pip install .

对于 RoCm 系统，在从源码安装时额外需要提前安装以下包：rocsparse-dev, hipsparse-dev,
rocthrust-dev, rocblas-dev and hipblas-dev。

（2）没有报错但是推理速度超级慢

此时查看auto-gptq版本，如果版本后没有带cu1xx，则可能是需要从源码安装