【心酸报错】ImportError: failed to find libmagic. Check your installation

目录

  • [报错信息:ImportError: failed to find libmagic. Check your installation](#报错信息:ImportError: failed to find libmagic. Check your installation)
  • 按照网络上找的办法修改
  • [还是报错:LookupError:Resource punkt not found.](#还是报错:LookupError:Resource punkt not found.)
  • 下载nltk_data
  • [又报错:AttributeError: 'tuple' object has no attribute 'page_content'](#又报错:AttributeError: 'tuple' object has no attribute 'page_content')
  • 怀疑是头文件的问题,修改头文件
  • 终成功!

报错信息:ImportError: failed to find libmagic. Check your installation

bash 复制代码
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 22, in <module>
    split_data = main_embedding()
                 ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 11, in main_embedding
    data = loader.load()# 加载数据
           ^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_core\document_loaders\base.py", line 30, in load
    return list(self.lazy_load())
           ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 150, in lazy_load
    yield from load_file(f=self.file, f_path=self.file_path)
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 184, in lazy_load
    else self._elements_json
         ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 203, in _elements_json
    return self._convert_elements_to_dicts(self._elements_via_local)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 221, in _elements_via_local      
    return partition(
           ^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\auto.py", line 186, in partition
    file_type = detect_filetype(
                ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 100, in detect_filetype
    return _FileTypeDetector.file_type(ctx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 133, in file_type
    return cls(ctx)._file_type
           ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 143, in _file_type
    if file_type := self._file_type_from_guessed_mime_type:
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 183, in _file_type_from_guessed_mime_type
    mime_type = self._ctx.mime_type
                ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\utils.py", line 155, in __get__
    value = self._fget(obj)
            ^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 364, in mime_type
    import magic
  File "D:\mydatapro\venv_net\Lib\site-packages\magic\__init__.py", line 209, in <module>
    libmagic = loader.load_lib()
               ^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\magic\loader.py", line 49, in load_lib
    raise ImportError('failed to find libmagic.  Check your installation')
ImportError: failed to find libmagic.  Check your installation

按照网络上找的办法修改

还是报错:LookupError:Resource punkt not found.

bash 复制代码
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 22, in <module>
    split_data = main_embedding()
                 ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 11, in main_embedding
    data = loader.load()# 加载数据
           ^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_core\document_loaders\base.py", line 30, in load
    return list(self.lazy_load())
           ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 150, in lazy_load
    yield from load_file(f=self.file, f_path=self.file_path)
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 184, in lazy_load
    else self._elements_json
         ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 203, in _elements_json
    return self._convert_elements_to_dicts(self._elements_via_local)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 221, in _elements_via_local      
    return partition(
           ^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\auto.py", line 415, in partition
    elements = partition_text(
               ^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\text.py", line 102, in partition_text
    return _partition_text(
           ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\documents\elements.py", line 605, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 706, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 662, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\chunking\dispatch.py", line 74, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\text.py", line 181, in _partition_text
    file_content = _split_by_paragraph(
                   ^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\text.py", line 361, in _split_by_paragraph
    _split_content_to_fit_max(
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\text.py", line 393, in _split_content_to_fit_max
    sentences = sent_tokenize(content)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\nlp\tokenize.py", line 131, in sent_tokenize
    return _sent_tokenize(text)
           ^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\nltk\tokenize\__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\nltk\data.py", line 750, in load
    opened_resource = _open(resource_url)
                      ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\nltk\data.py", line 876, in _open
    return find(path_, path + [""]).open()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\nltk\data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/english.pickle

  Searched in:
    - 'C:\\Users\\shuhu/nltk_data'
    - 'D:\\mydatapro\\venv_net\\nltk_data'
    - 'D:\\mydatapro\\venv_net\\share\\nltk_data'
    - 'D:\\mydatapro\\venv_net\\lib\\nltk_data'
    - 'C:\\Users\\shuhu\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - ''
**********************************************************************

下载nltk_data

  • 网络一直不太稳定下载了很久,还设置了环境变量。

又报错:AttributeError: 'tuple' object has no attribute 'page_content'

  • 这个函数可不是我写的,这个是官方文件里面的。
bash 复制代码
D:\mydatapro\venv_net\Lib\site-packages\langchain_core\_api\deprecation.py:141: LangChainDeprecationWarning: The class `HuggingFaceEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 0.3.0. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEmbeddings`.
  warn_deprecated(
INFO: Use pytorch device_name: cpu
INFO: Load pretrained SentenceTransformer: F:\\moka-ai_m3e-base
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
INFO: Use pytorch device_name: cpu
INFO: Load pretrained SentenceTransformer: F:\\moka-ai_m3e-base
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
INFO: Load pretrained SentenceTransformer: F:\\moka-ai_m3e-base
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
    split_data = main_embedding()
    split_data = main_embedding()
                 ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 18, in main_embedding
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 18, in main_embedding
    db = FAISS.from_documents(embeddings,split_data)# 构建向量库
    db = FAISS.from_documents(embeddings,split_data)# 构建向量库
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_core\vectorstores\base.py", line 831, in from_documents
    texts = [d.page_content for d in documents]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_core\vectorstores\base.py", line 831, in <listcomp>
    texts = [d.page_content for d in documents]
             ^^^^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'page_content'
  • 不知道为什么这个报错信息很多重复的,难道是因为网络?不太懂。

怀疑是头文件的问题,修改头文件

我只改了头文件,所以我只能把错误原因归为网络问题or库有问题,这个是我最后用的所有头文件:

python 复制代码
from langchain_unstructured import UnstructuredLoader# 加载文档
from langchain_text_splitters import RecursiveCharacterTextSplitter# 切分文档
from langchain_huggingface import HuggingFaceEmbeddings# 向量化
from langchain_community.vectorstores import FAISS# 向量库 

终成功!

bash 复制代码
(venv_net) PS D:\mydatapro\myweb> python AutoTokenizer.py
INFO: Use pytorch device_name: cpu
INFO: Load pretrained SentenceTransformer: F:\\moka-ai_m3e-base
INFO: Loading faiss with AVX2 support.
INFO: Successfully loaded faiss with AVX2 support.
[Document(metadata={'source': './dataset/test.txt', 'file_directory': './dataset', 'filename': 'test.txt', 'last_modified': '2024-08-16T16:11:37', 'languages': ['zho'], 'filetype': 'text/plain', 'category': 'Title', 'element_id': '2ec66fdb03bd40ec722fd30005d3739a'}, page_content='国家建立的负责收集和保存本国出版物,担负国家总书库职能的图书馆。'), Document(metadata={'source': './dataset/test.txt', 'file_directory': './dataset', 'filename': 'test.txt', 'last_modified': '2024-08-16T16:11:37', 'languages': ['zho'], 'filetype': 'text/plain', 'category': 'Title', 'element_id': '39a938c715ce1a4b38af2b878c2d29d4'}, page_content='国家图书馆一般除收藏本国出版物外,还收藏大量外文出版物 (包括有关本国的外文书刊), 并负责编制国家书目和联合目录。'), Document(metadata={'source': './dataset/test.txt', 'file_directory': './dataset', 'filename': 'test.txt', 'last_modified': '2024-08-16T16:11:37', 'languages': ['zho'], 'filetype': 'text/plain', 'category': 'Title', 'element_id': '2ddfef3787246755bfd1955ef3eacb54'}, page_content='国家图书馆是一个国家 图书事业的推动者,是面向全国的中心图书馆,既是全国的藏书中心、馆际互借中心、国际书刊交换中心,'), Document(metadata={'source': './dataset/test.txt', 'file_directory': './dataset', 'filename': 'test.txt', 'last_modified': '2024-08-16T16:11:37', 'languages': ['zho'], 'filetype': 'text/plain', 'category': 'Title', 'element_id': 'ca80db5e9d73b32e59eb3dc122b274c6'}, page_content='也是全国的书目 和图书馆学研究的中心。')]
相关推荐
九年义务漏网鲨鱼2 小时前
【大模型学习 | MINIGPT-4原理】
人工智能·深度学习·学习·语言模型·多模态
蹦蹦跳跳真可爱5893 小时前
Python----OpenCV(图像増强——高通滤波(索贝尔算子、沙尔算子、拉普拉斯算子),图像浮雕与特效处理)
人工智能·python·opencv·计算机视觉
nananaij3 小时前
【Python进阶篇 面向对象程序设计(3) 继承】
开发语言·python·神经网络·pycharm
雷羿 LexChien3 小时前
从 Prompt 管理到人格稳定:探索 Cursor AI 编辑器如何赋能 Prompt 工程与人格风格设计(上)
人工智能·python·llm·编辑器·prompt
敲键盘的小夜猫4 小时前
LLM复杂记忆存储-多会话隔离案例实战
人工智能·python·langchain
高压锅_12204 小时前
Django Channels WebSocket实时通信实战:从聊天功能到消息推送
python·websocket·django
胖达不服输5 小时前
「日拱一码」020 机器学习——数据处理
人工智能·python·机器学习·数据处理
吴佳浩5 小时前
Python入门指南-番外-LLM-Fingerprint(大语言模型指纹):从技术视角看AI开源生态的边界与挑战
python·llm·mcp
吴佳浩6 小时前
Python入门指南-AI模型相似性检测方法:技术原理与实现
人工智能·python·llm
叶 落6 小时前
计算阶梯电费
python·python 基础·python 入门