win10 Langchain-chatchat 知识库本地搭建记录

一、clone源码

bash 复制代码
git clone https://github.com/chatchat-space/Langchain-Chatchat.git

二、环境准备

ini 复制代码
conda create -n Chatchat python==3.10
conda activate Chatchat
​

三、模型配置

model_config.py 中

makefile 复制代码
​
# 选用的 Embedding 名称
EMBEDDING_MODEL = "m3e-base"
​
LLM_MODELS = ["zhipu-api"] 
​
ONLINE_LLM_MODEL = {
    # 具体注册及api key获取请前往 http://open.bigmodel.cn
    "zhipu-api": {
        "api_key": "你自己的智普API key",
        "version": "chatglm_turbo",  # 可选包括 "chatglm_turbo"
        "provider": "ChatGLMWorker",
    },
 }
    
    MODEL_PATH = {
    "embed_model": {
        "zhipu-api": "lucidrains/GLM-130B",
        "m3e-base": "G:\AIGC\Langchain\m3e-base-main",
    },
    
    "llm_model": {
        "zhipu-api": "lucidrains/GLM-130B",
     }
  }

四、报错问题

css 复制代码
python init_database.py --recreate-vs 初始数据库失败:

初始数据库失败:

github.com/chatchat-sp...

github.com/chatchat-sp...

github.com/chatchat-sp...

arduino 复制代码
(langchain) G:\AIGC\Langchain\Langchain-Chatchat>
(langchain) G:\AIGC\Langchain\Langchain-Chatchat>python init_database.py --recreate-vs
recreating all vector stores
2023-12-19 17:02:47,732 - faiss_cache.py[line:80] - INFO: loading vector store in 'samples/vector_store/bge-large-zh' from disk.
2023-12-19 17:02:51,277 - SentenceTransformer.py[line:66] - INFO: Load pretrained SentenceTransformer: BAAI/bge-large-zh
2023-12-19 17:03:33,432 - embeddings_api.py[line:39] - ERROR: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/BAAI/bge-large-zh (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001F06C868BE0>, 'Connection to huggingface.co timed out. (connect timeout=None)'))"), '(Request ID: 149213c1-2ec8-4340-90cd-f6d60fdde1da)')
AttributeError: 'NoneType' object has no attribute 'conjugate'
​
The above exception was the direct cause of the following exception:
​
Traceback (most recent call last):
  File "G:\AIGC\Langchain\Langchain-Chatchat\init_database.py", line 108, in <module>
    folder2db(kb_names=args.kb_name, mode="recreate_vs", embed_model=args.embed_model)
  File "G:\AIGC\Langchain\Langchain-Chatchat\server\knowledge_base\migrate.py", line 121, in folder2db
    kb.create_kb()
  File "G:\AIGC\Langchain\Langchain-Chatchat\server\knowledge_base\kb_service\base.py", line 81, in create_kb
    self.do_create_kb()
  File "G:\AIGC\Langchain\Langchain-Chatchat\server\knowledge_base\kb_service\faiss_kb_service.py", line 47, in do_create_kb
    self.load_vector_store()
  File "G:\AIGC\Langchain\Langchain-Chatchat\server\knowledge_base\kb_service\faiss_kb_service.py", line 28, in load_vector_store
    return kb_faiss_pool.load_vector_store(kb_name=self.kb_name,
  File "G:\AIGC\Langchain\Langchain-Chatchat\server\knowledge_base\kb_cache\faiss_cache.py", line 90, in load_vector_store
    vector_store = self.new_vector_store(embed_model=embed_model, embed_device=embed_device)
  File "G:\AIGC\Langchain\Langchain-Chatchat\server\knowledge_base\kb_cache\faiss_cache.py", line 48, in new_vector_store
    vector_store = FAISS.from_documents([doc], embeddings, normalize_L2=True)
  File "F:\Anaconda3\envs\langchain\lib\site-packages\langchain_core\vectorstores.py", line 510, in from_documents
    return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
  File "F:\Anaconda3\envs\langchain\lib\site-packages\langchain\vectorstores\faiss.py", line 911, in from_texts
    embeddings = embedding.embed_documents(texts)
  File "G:\AIGC\Langchain\Langchain-Chatchat\server\knowledge_base\kb_service\base.py", line 399, in embed_documents
    return normalize(embeddings).tolist()
  File "G:\AIGC\Langchain\Langchain-Chatchat\server\knowledge_base\kb_service\base.py", line 38, in normalize
    norm = np.linalg.norm(embeddings, axis=1)
  File "<__array_function__ internals>", line 200, in norm
  File "F:\Anaconda3\envs\langchain\lib\site-packages\numpy\linalg\linalg.py", line 2541, in norm
    s = (x.conj() * x).real
TypeError: loop of ufunc does not support argument 0 of type NoneType which has no callable conjugate method

解决方法:

EMBEDDING_MODEL 改成bge-large-zh

然后清空knowledge_base 重新初始化向量库即可。

启动startup.py

python startup.py -a

github.com/chatchat-sp...

pip install zhipuai pypi.tuna.tsinghua.edu.cn/simple

python 复制代码
2023-12-19 15:44:46,117 - utils.py[line:24] - ERROR: object of type 'NoneType' has no len()
Traceback (most recent call last):
  File "G:\AIGC\Langchain\Langchain-Chatchat\server\utils.py", line 22, in wrap_done
    await fn
  File "F:\Anaconda3\envs\langchain\lib\site-packages\langchain\chains\base.py", line 381, in acall
    raise e
  File "F:\Anaconda3\envs\langchain\lib\site-packages\langchain\chains\base.py", line 375, in acall
    await self._acall(inputs, run_manager=run_manager)
  File "F:\Anaconda3\envs\langchain\lib\site-packages\langchain\chains\llm.py", line 275, in _acall
    response = await self.agenerate([inputs], run_manager=run_manager)
  File "F:\Anaconda3\envs\langchain\lib\site-packages\langchain\chains\llm.py", line 142, in agenerate
    return await self.llm.agenerate_prompt(
  File "F:\Anaconda3\envs\langchain\lib\site-packages\langchain_core\language_models\chat_models.py", line 501, in agenerate_prompt
    return await self.agenerate(
  File "F:\Anaconda3\envs\langchain\lib\site-packages\langchain_core\language_models\chat_models.py", line 461, in agenerate
    raise exceptions[0]
  File "F:\Anaconda3\envs\langchain\lib\site-packages\langchain_core\language_models\chat_models.py", line 564, in _agenerate_with_cache
    return await self._agenerate(
  File "F:\Anaconda3\envs\langchain\lib\site-packages\langchain_community\chat_models\openai.py", line 518, in _agenerate
    return await agenerate_from_stream(stream_iter)
  File "F:\Anaconda3\envs\langchain\lib\site-packages\langchain_core\language_models\chat_models.py", line 81, in agenerate_from_stream
    async for chunk in stream:
  File "F:\Anaconda3\envs\langchain\lib\site-packages\langchain_community\chat_models\openai.py", line 489, in _astream
    if len(chunk["choices"]) == 0:
TypeError: object of type 'NoneType' has no len()
2023-12-19 15:44:46,122 - utils.py[line:27] - ERROR: TypeError: Caught exception: object of type 'NoneType' has no len()

启动 webui:

streamlit run webui.py

less 复制代码
(langchain) G:\AIGC\Langchain\Langchain-Chatchat>
(langchain) G:\AIGC\Langchain\Langchain-Chatchat>streamlit run webui.py
​
  You can now view your Streamlit app in your browser.
​
  Local URL: http://localhost:8501
  Network URL: http://192.168.43.195:8501
​
2023-12-19 14:21:27,722 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:29,726 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:31,032 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:31,729 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:33,035 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:33,838 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:35,041 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:35,503 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:35,843 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:36,099 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:37,519 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:37,857 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:37.859 Uncaught app exception
Traceback (most recent call last):
  File "F:\Anaconda3\envs\langchain\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 534, in _run_script
    exec(code, module.__dict__)
  File "G:\AIGC\Langchain\Langchain-Chatchat\webui.py", line 64, in <module>
    pages[selected_page]["func"](api=api, is_lite=is_lite)
  File "G:\AIGC\Langchain\Langchain-Chatchat\webui_pages\dialogue\dialogue.py", line 165, in dialogue_page
    running_models = list(api.list_running_models())
TypeError: 'NoneType' object is not iterable
2023-12-19 14:21:38,116 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:39,526 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:40,131 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:41,635 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:42,240 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:43,641 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:44,248 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:45,647 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:45.647 Uncaught app exception
Traceback (most recent call last):
  File "F:\Anaconda3\envs\langchain\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 534, in _run_script
    exec(code, module.__dict__)
  File "G:\AIGC\Langchain\Langchain-Chatchat\webui.py", line 64, in <module>
    pages[selected_page]["func"](api=api, is_lite=is_lite)
  File "G:\AIGC\Langchain\Langchain-Chatchat\webui_pages\dialogue\dialogue.py", line 165, in dialogue_page
    running_models = list(api.list_running_models())
TypeError: 'NoneType' object is not iterable
2023-12-19 14:21:46,262 - utils.py[line:95] - ERROR: ConnectError: error when post /llm_model/list_running_models: [WinError 10061] 由于目标计算机积极拒绝,无法连接。
2023-12-19 14:21:46.262 Uncaught app exception
Traceback (most recent call last):
  File "F:\Anaconda3\envs\langchain\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 534, in _run_script
    exec(code, module.__dict__)
  File "G:\AIGC\Langchain\Langchain-Chatchat\webui.py", line 64, in <module>
    pages[selected_page]["func"](api=api, is_lite=is_lite)
  File "G:\AIGC\Langchain\Langchain-Chatchat\webui_pages\dialogue\dialogue.py", line 165, in dialogue_page
    running_models = list(api.list_running_models())
TypeError: 'NoneType' object is not iterable

创建知识库失败

yaml 复制代码
2023-12-20 10:43:16,728 - SentenceTransformer.py[line:66] - INFO: Load pretrained SentenceTransformer: G:\AIGC\Langchain\m3e-base-main
2023-12-20 10:43:21,466 - embeddings_api.py[line:39] - ERROR: Error while deserializing header: HeaderTooLarge
2023-12-20 10:43:21,483 - kb_api.py[line:34] - ERROR: TypeError: 创建知识库出错: loop of ufunc does not support argument 0 of type NoneType which has no callable conjugate method

解决方法:

EMBEDDING_MODEL 改成bge-large-zh

shell 复制代码
$ git lfs install
$ git clone https://huggingface.co/BAAI/bge-large-zh

然后清空knowledge_base 执行命令 python init_database.py --recreate-vs 重新初始化向量库即可,以上问题均得到解决。

五、启动信息

css 复制代码
(langchain) G:\AIGC\Langchain\Langchain-Chatchat>python startup.py -a
​
​
==============================Langchain-Chatchat Configuration==============================
操作系统:Windows-10-10.0.18363-SP0.
python版本:3.10.12 | packaged by Anaconda, Inc. | (main, Jul  5 2023, 19:01:18) [MSC v.1916 64 bit (AMD64)]
项目版本:v0.2.8
langchain版本:0.0.344. fastchat版本:0.2.34
​
​
当前使用的分词器:ChineseRecursiveTextSplitter
当前启动的LLM模型:['zhipu-api'] @ cpu
{'api_key': '你自己的apikey',
 'device': 'cpu',
 'host': '127.0.0.1',
 'infer_turbo': False,
 'model_path': 'lucidrains/GLM-130B',
 'online_api': True,
 'port': 21001,
 'provider': 'ChatGLMWorker',
 'version': 'chatglm_turbo',
 'worker_class': <class 'server.model_workers.zhipu.ChatGLMWorker'>}
当前Embbedings模型: m3e-base @ cpu
==============================Langchain-Chatchat Configuration==============================
​
​
2023-12-20 10:09:39,873 - startup.py[line:650] - INFO: 正在启动服务:
2023-12-20 10:09:39,873 - startup.py[line:651] - INFO: 如需查看 llm_api 日志,请前往 G:\AIGC\Langchain\Langchain-Chatchat\logs
2023-12-20 10:09:52 | INFO | model_worker | Register to controller
2023-12-20 10:09:54 | ERROR | stderr | INFO:     Started server process [27468]
2023-12-20 10:09:54 | ERROR | stderr | INFO:     Waiting for application startup.
2023-12-20 10:09:54 | ERROR | stderr | INFO:     Application startup complete.
2023-12-20 10:09:54 | ERROR | stderr | INFO:     Uvicorn running on http://127.0.0.1:20000 (Press CTRL+C to quit)
INFO:     Started server process [25024]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:7861 (Press CTRL+C to quit)
​
​
==============================Langchain-Chatchat Configuration==============================
操作系统:Windows-10-10.0.18363-SP0.
python版本:3.10.12 | packaged by Anaconda, Inc. | (main, Jul  5 2023, 19:01:18) [MSC v.1916 64 bit (AMD64)]
项目版本:v0.2.8
langchain版本:0.0.344. fastchat版本:0.2.34
​
​
当前使用的分词器:ChineseRecursiveTextSplitter
当前启动的LLM模型:['zhipu-api'] @ cpu
{'api_key': '你自己的apikey',
 'device': 'cpu',
 'host': '127.0.0.1',
 'infer_turbo': False,
 'model_path': 'lucidrains/GLM-130B',
 'online_api': True,
 'port': 21001,
 'provider': 'ChatGLMWorker',
 'version': 'chatglm_turbo',
 'worker_class': <class 'server.model_workers.zhipu.ChatGLMWorker'>}
当前Embbedings模型: m3e-base @ cpu
​
​
服务端运行信息:
    OpenAI API Server: http://127.0.0.1:20000/v1
    Chatchat  API  Server: http://127.0.0.1:7861
    Chatchat WEBUI Server: http://127.0.0.1:8501
==============================Langchain-Chatchat Configuration==============================
​
​
​
  You can now view your Streamlit app in your browser.
​
  URL: http://127.0.0.1:8501

启动页面如下:

六、注意事项

新建知识库名字不支持中文名称,且导入PDF解析速度较慢:

参考资料:

blog.csdn.net/hero2722856...

zhuanlan.zhihu.com/p/670696982

相关推荐
新加坡内哥谈技术8 分钟前
口哨声、歌声、boing声和biotwang声:用AI识别鲸鱼叫声
人工智能·自然语言处理
wx74085132619 分钟前
小琳AI课堂:机器学习
人工智能·机器学习
FL162386312927 分钟前
[数据集][目标检测]车油口挡板开关闭合检测数据集VOC+YOLO格式138张2类别
人工智能·yolo·目标检测
YesPMP平台官方29 分钟前
AI+教育|拥抱AI智能科技,让课堂更生动高效
人工智能·科技·ai·数据分析·软件开发·教育
FL16238631291 小时前
AI健身体能测试之基于paddlehub实现引体向上计数个数统计
人工智能
黑客-雨1 小时前
构建你的AI职业生涯:从基础知识到专业实践的路线图
人工智能·产品经理·ai大模型·ai产品经理·大模型学习·大模型入门·大模型教程
子午1 小时前
动物识别系统Python+卷积神经网络算法+TensorFlow+人工智能+图像识别+计算机毕业设计项目
人工智能·python·cnn
大耳朵爱学习1 小时前
掌握Transformer之注意力为什么有效
人工智能·深度学习·自然语言处理·大模型·llm·transformer·大语言模型
TAICHIFEI1 小时前
目标检测-数据集
人工智能·目标检测·目标跟踪
qq_15321452641 小时前
【2023工业异常检测文献】SimpleNet
图像处理·人工智能·深度学习·神经网络·机器学习·计算机视觉·视觉检测