LLM之本地部署GraphRAG（GLM-4+Xinference的embedding模型）（附带ollma部署方式）

前言

有空再写

微软开源的GraphRAG默认是使用openai的接口的（GPT的接口那是要money的），于是就研究了如何使用开源模型本地部署。

源码地址：https://github.com/microsoft/graphrag

操作文档：https://microsoft.github.io/graphrag/posts/get_started/

注：1、其实本次操作也可以不用下载源码，使用pip install graphrag 就能将graphrag下载下来变成包，建议大家先跑起来之后，再去看源码

2、要有足够的显存，给大家看看我的使用情况

一、启动GLM-4

由于GraphRAG默认是使用openai的接口的，而GLM-4使用VLLM提供了相应的接口，这里就不在对GLM-4以及VLLM进行介绍啦，感兴趣的小伙伴可以自己去搜索一下

GLM-4的github地址： https://github.com/THUDM/GLM-4

huggingface地址**：** https://huggingface.co/collections/THUDM/glm-4-665fcf188c414b03c2f7e3b7

百度网盘链接（glm-4-9b-chat）：

链接：https://pan.baidu.com/s/1dSMVbFg8GTfS901MZYOiMw?pwd=o2xt

提取码：o2xt

下载好之后如下：THUDM是模型存放的文件夹

然后我们执行

pip install uvicorn vllm fastapi

安装好相应的包之后，在GLM-4\basic_demo\openai_api_server.py文件中修改

复制代码

MODEL_PATH为自己存放模型的所在路径（建议使用绝对路径）

代码默认的host是0.0.0.0，大家可以按需要自己修改想要的IP和端口号

运行如下：

二、启动Xinference

Xorbits Inference (Xinference) 是一个开源平台，用于简化各种 AI 模型的运行和集成。借助 Xinference，您可以使用任何开源 LLM、嵌入模型和多模态模型在云端或本地环境中运行推理，并创建强大的 AI 应用

官网：https://inference.readthedocs.io/zh-cn/latest/index.html

快速安装文档：https://github.com/xorbitsai/inference/blob/main/README_zh_CN.md

Transformers 引擎

复制代码

pip install "xinference[transformers]"

vLLM 引擎

复制代码

pip install "xinference[vllm]"

Llama.cpp 引擎

初始步骤：

复制代码

pip install xinference

不同硬件的安装方式：

Apple M系列：

复制代码

CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python

英伟达显卡：

复制代码

CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

AMD 显卡：

复制代码

CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python

全部

复制代码

pip install "xinference[all]"

由于我们前面使用vllm运行GLM-4，所以我们运行 pip install "xinference[vllm]"

后台启动服务

如果是服务器端，得到服务器的IP再改上去运行就可以了

xinference-local --host 0.0.0.0 --port 9997

例如我的就是 xinference-local --host 192.0.0.181 --port 9997

在浏览器上输入 http://192.0.0.181:9997/

在 EMBEDDING MODELS 中选一个，我这边选择的是bge-m3 大家也可以选择bge-base-en先尝试尝试，按需求选择

点击小火箭进行下载

下载完成之后会自动跳入Running Models中

三、启动GraphGAG

参考我上面发的GraphGAG操作文档

1、创建目录

复制代码

mkdir -p ./ragtest/input

2、下载文档

复制代码

curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt > ./ragtest/input/book.txt

或者大家自己随便找个txt文件或者小说啥的，塞进去也行

3、初始化

复制代码

python -m graphrag.index --init --root ./ragtest

执行的是

在ragtest下会多出个目录prompts，里边存放着一些prompts

4、更改settings.yaml

更改内容如下：对llm、embeddings部分内容进行了修改：

改动了api_key, model, 增加了api_base，但type不需要变动。

encoding_model: cl100k_base

skip_workflows: []

llm:

api_key: glm-4

type: openai_chat # or azure_openai_chat

model: glm-4-9b-chat

model_supports_json: true # recommended if this is available for your model.

api_base: http://0.0.0.0:8081/v1

parallelization:

stagger: 0.3

async_mode: threaded # or asyncio

embeddings:

async_mode: threaded # or asyncio

llm:

api_key: xinference

type: openai_embedding # or azure_openai_embedding

model: bge-m3

api_base: http://192.0.0.181:9997/v1

5、Running the Indexing pipeline

复制代码

python -m graphrag.index --root ./ragtest

这是最容易出错的地方了

1、

11:11:54,823 graphrag.llm.openai.utils ERROR error loading json, json=

Traceback (most recent call last):

File "/home/nlp/graphrag-main/graphrag/llm/openai/utils.py", line 93, in try_parse_json_object

result = json.loads(input)

File "/home/anaconda3/envs/nlp/lib/python3.10/json/init.py", line 346, in loads

return _default_decoder.decode(s)

File "/home/anaconda3/envs/nlp/lib/python3.10/json/decoder.py", line 337, in decode

obj, end = self.raw_decode(s, idx=_w(s, 0).end())

File "/home/anaconda3/envs/nlp/lib/python3.10/json/decoder.py", line 355, in raw_decode

raise JSONDecodeError("Expecting value", s, err.value) from None

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

是因为在graphrag\llm\openai\openai_chat_llm.py 中try_parse_json_object(output)函数中接收的output对应的格式不对，不知道咋描述，大家可以自己瞅瞅

解决办法：

在graphrag-main\graphrag\llm\openai\utils.py中替换掉try_parse_json_object的定义，如下：

复制代码

def try_parse_json_object(input: str) -> dict:
    """Generate JSON-string output using best-attempt prompting & parsing techniques."""
    try:
        clean_json = clean_up_json(input)
        result = json.loads(clean_json)
    except json.JSONDecodeError:
        log.exception("error loading json, json=%s", input)
        raise
    else:
        if not isinstance(result, dict):
            raise TypeError
        return result

def clean_up_json(json_str: str) -> str:
    """Clean up json string."""
    json_str = (
        json_str.replace("\\n", "")
        .replace("\n", "")
        .replace("\r", "")
        .replace('"[{', "[{")
        .replace('}]"', "}]")
        .replace("\\", "")
        # Refer: graphrag\llm\openai\_json.py,graphrag\index\utils\json.py
        .replace("{{", "{")
        .replace("}}", "}")
        .strip()
    )

    # Remove JSON Markdown Frame
    if json_str.startswith("```json"):
        json_str = json_str[len("```json"):]
    if json_str.endswith("```"):
        json_str = json_str[: len(json_str) - len("```")]
    return json_str

2、如果输出的output是空的，不妨将max_tokens调小一点

如果出现🚀 All workflows completed successfully，恭喜你完成了安装

5、Running the Query Engine

使用Global search：

复制代码

python -m graphrag.query \
--root ./ragtest \
--method global \
"What are the top themes in this story?"

然后你就会得到这个故事的主题

使用 Local search

复制代码

python -m graphrag.query \
--root ./ragtest \
--method local \
"Who is Scrooge, and what are his main relationships?"

然后你就会得到Scrooge是谁以及他相应的关系介绍

如果在执行Local search，模型输出为空

解决办法：在settings.yaml找到local_search:调小你的max_tokens

如果在执行 Global search，出现报错，回复你：I am sorry but I am unable to answer this question given the provided data

raise JSONDecodeError("Expecting value", s, err.value) from None

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

解决办法：在settings.yaml找到global_search，调小你的max_tokens与data_max_tokens:

给大家看看我的输出，我额外加了一段斗破苍穹的小说txt，并且魔改了一下，加进了自己的名字，嘿嘿

6、ollama部署

累了，看链接吧

颠覆传统RAG！GraphRAG结合本地大模型：Gemma 2+Nomic Embed齐上阵，轻松掌握GraphRAG+Chainlit+Ollama

GraphRAG本地运行（Ollama的LLM接口＋Xinference的embedding模型）无需gpt的api

欢迎大家点赞或收藏。

大家的点赞或收藏可以鼓励作者加快更新哟~

其他系列的文章也不错哟~