【心酸报错】ImportError: failed to find libmagic. Check your installation

目录

  • [报错信息:ImportError: failed to find libmagic. Check your installation](#报错信息:ImportError: failed to find libmagic. Check your installation)
  • 按照网络上找的办法修改
  • [还是报错:LookupError:Resource punkt not found.](#还是报错:LookupError:Resource punkt not found.)
  • 下载nltk_data
  • [又报错:AttributeError: 'tuple' object has no attribute 'page_content'](#又报错:AttributeError: 'tuple' object has no attribute 'page_content')
  • 怀疑是头文件的问题,修改头文件
  • 终成功!

报错信息:ImportError: failed to find libmagic. Check your installation

bash 复制代码
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 22, in <module>
    split_data = main_embedding()
                 ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 11, in main_embedding
    data = loader.load()# 加载数据
           ^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_core\document_loaders\base.py", line 30, in load
    return list(self.lazy_load())
           ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 150, in lazy_load
    yield from load_file(f=self.file, f_path=self.file_path)
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 184, in lazy_load
    else self._elements_json
         ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 203, in _elements_json
    return self._convert_elements_to_dicts(self._elements_via_local)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 221, in _elements_via_local      
    return partition(
           ^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\auto.py", line 186, in partition
    file_type = detect_filetype(
                ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 100, in detect_filetype
    return _FileTypeDetector.file_type(ctx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 133, in file_type
    return cls(ctx)._file_type
           ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 143, in _file_type
    if file_type := self._file_type_from_guessed_mime_type:
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 183, in _file_type_from_guessed_mime_type
    mime_type = self._ctx.mime_type
                ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\utils.py", line 155, in __get__
    value = self._fget(obj)
            ^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 364, in mime_type
    import magic
  File "D:\mydatapro\venv_net\Lib\site-packages\magic\__init__.py", line 209, in <module>
    libmagic = loader.load_lib()
               ^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\magic\loader.py", line 49, in load_lib
    raise ImportError('failed to find libmagic.  Check your installation')
ImportError: failed to find libmagic.  Check your installation

按照网络上找的办法修改

还是报错:LookupError:Resource punkt not found.

bash 复制代码
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 22, in <module>
    split_data = main_embedding()
                 ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 11, in main_embedding
    data = loader.load()# 加载数据
           ^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_core\document_loaders\base.py", line 30, in load
    return list(self.lazy_load())
           ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 150, in lazy_load
    yield from load_file(f=self.file, f_path=self.file_path)
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 184, in lazy_load
    else self._elements_json
         ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 203, in _elements_json
    return self._convert_elements_to_dicts(self._elements_via_local)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 221, in _elements_via_local      
    return partition(
           ^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\auto.py", line 415, in partition
    elements = partition_text(
               ^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\text.py", line 102, in partition_text
    return _partition_text(
           ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\documents\elements.py", line 605, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 706, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 662, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\chunking\dispatch.py", line 74, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\text.py", line 181, in _partition_text
    file_content = _split_by_paragraph(
                   ^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\text.py", line 361, in _split_by_paragraph
    _split_content_to_fit_max(
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\text.py", line 393, in _split_content_to_fit_max
    sentences = sent_tokenize(content)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\nlp\tokenize.py", line 131, in sent_tokenize
    return _sent_tokenize(text)
           ^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\nltk\tokenize\__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\nltk\data.py", line 750, in load
    opened_resource = _open(resource_url)
                      ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\nltk\data.py", line 876, in _open
    return find(path_, path + [""]).open()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\nltk\data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/english.pickle

  Searched in:
    - 'C:\\Users\\shuhu/nltk_data'
    - 'D:\\mydatapro\\venv_net\\nltk_data'
    - 'D:\\mydatapro\\venv_net\\share\\nltk_data'
    - 'D:\\mydatapro\\venv_net\\lib\\nltk_data'
    - 'C:\\Users\\shuhu\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - ''
**********************************************************************

下载nltk_data

  • 网络一直不太稳定下载了很久,还设置了环境变量。

又报错:AttributeError: 'tuple' object has no attribute 'page_content'

  • 这个函数可不是我写的,这个是官方文件里面的。
bash 复制代码
D:\mydatapro\venv_net\Lib\site-packages\langchain_core\_api\deprecation.py:141: LangChainDeprecationWarning: The class `HuggingFaceEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 0.3.0. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEmbeddings`.
  warn_deprecated(
INFO: Use pytorch device_name: cpu
INFO: Load pretrained SentenceTransformer: F:\\moka-ai_m3e-base
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
INFO: Use pytorch device_name: cpu
INFO: Load pretrained SentenceTransformer: F:\\moka-ai_m3e-base
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
INFO: Load pretrained SentenceTransformer: F:\\moka-ai_m3e-base
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
    split_data = main_embedding()
    split_data = main_embedding()
                 ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 18, in main_embedding
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 18, in main_embedding
    db = FAISS.from_documents(embeddings,split_data)# 构建向量库
    db = FAISS.from_documents(embeddings,split_data)# 构建向量库
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_core\vectorstores\base.py", line 831, in from_documents
    texts = [d.page_content for d in documents]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_core\vectorstores\base.py", line 831, in <listcomp>
    texts = [d.page_content for d in documents]
             ^^^^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'page_content'
  • 不知道为什么这个报错信息很多重复的,难道是因为网络?不太懂。

怀疑是头文件的问题,修改头文件

我只改了头文件,所以我只能把错误原因归为网络问题or库有问题,这个是我最后用的所有头文件:

python 复制代码
from langchain_unstructured import UnstructuredLoader# 加载文档
from langchain_text_splitters import RecursiveCharacterTextSplitter# 切分文档
from langchain_huggingface import HuggingFaceEmbeddings# 向量化
from langchain_community.vectorstores import FAISS# 向量库 

终成功!

bash 复制代码
(venv_net) PS D:\mydatapro\myweb> python AutoTokenizer.py
INFO: Use pytorch device_name: cpu
INFO: Load pretrained SentenceTransformer: F:\\moka-ai_m3e-base
INFO: Loading faiss with AVX2 support.
INFO: Successfully loaded faiss with AVX2 support.
[Document(metadata={'source': './dataset/test.txt', 'file_directory': './dataset', 'filename': 'test.txt', 'last_modified': '2024-08-16T16:11:37', 'languages': ['zho'], 'filetype': 'text/plain', 'category': 'Title', 'element_id': '2ec66fdb03bd40ec722fd30005d3739a'}, page_content='国家建立的负责收集和保存本国出版物,担负国家总书库职能的图书馆。'), Document(metadata={'source': './dataset/test.txt', 'file_directory': './dataset', 'filename': 'test.txt', 'last_modified': '2024-08-16T16:11:37', 'languages': ['zho'], 'filetype': 'text/plain', 'category': 'Title', 'element_id': '39a938c715ce1a4b38af2b878c2d29d4'}, page_content='国家图书馆一般除收藏本国出版物外,还收藏大量外文出版物 (包括有关本国的外文书刊), 并负责编制国家书目和联合目录。'), Document(metadata={'source': './dataset/test.txt', 'file_directory': './dataset', 'filename': 'test.txt', 'last_modified': '2024-08-16T16:11:37', 'languages': ['zho'], 'filetype': 'text/plain', 'category': 'Title', 'element_id': '2ddfef3787246755bfd1955ef3eacb54'}, page_content='国家图书馆是一个国家 图书事业的推动者,是面向全国的中心图书馆,既是全国的藏书中心、馆际互借中心、国际书刊交换中心,'), Document(metadata={'source': './dataset/test.txt', 'file_directory': './dataset', 'filename': 'test.txt', 'last_modified': '2024-08-16T16:11:37', 'languages': ['zho'], 'filetype': 'text/plain', 'category': 'Title', 'element_id': 'ca80db5e9d73b32e59eb3dc122b274c6'}, page_content='也是全国的书目 和图书馆学研究的中心。')]
相关推荐
数据智能老司机4 小时前
精通 Python 设计模式——分布式系统模式
python·设计模式·架构
数据智能老司机5 小时前
精通 Python 设计模式——并发与异步模式
python·设计模式·编程语言
数据智能老司机5 小时前
精通 Python 设计模式——测试模式
python·设计模式·架构
数据智能老司机5 小时前
精通 Python 设计模式——性能模式
python·设计模式·架构
c8i6 小时前
drf初步梳理
python·django
每日AI新事件6 小时前
python的异步函数
python
这里有鱼汤7 小时前
miniQMT下载历史行情数据太慢怎么办?一招提速10倍!
前端·python
databook16 小时前
Manim实现脉冲闪烁特效
后端·python·动效
程序设计实验室16 小时前
2025年了,在 Django 之外,Python Web 框架还能怎么选?
python
倔强青铜三18 小时前
苦练Python第46天:文件写入与上下文管理器
人工智能·python·面试