【心酸报错】ImportError: failed to find libmagic. Check your installation

目录

  • [报错信息:ImportError: failed to find libmagic. Check your installation](#报错信息:ImportError: failed to find libmagic. Check your installation)
  • 按照网络上找的办法修改
  • [还是报错:LookupError:Resource punkt not found.](#还是报错:LookupError:Resource punkt not found.)
  • 下载nltk_data
  • [又报错:AttributeError: 'tuple' object has no attribute 'page_content'](#又报错:AttributeError: 'tuple' object has no attribute 'page_content')
  • 怀疑是头文件的问题,修改头文件
  • 终成功!

报错信息:ImportError: failed to find libmagic. Check your installation

bash 复制代码
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 22, in <module>
    split_data = main_embedding()
                 ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 11, in main_embedding
    data = loader.load()# 加载数据
           ^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_core\document_loaders\base.py", line 30, in load
    return list(self.lazy_load())
           ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 150, in lazy_load
    yield from load_file(f=self.file, f_path=self.file_path)
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 184, in lazy_load
    else self._elements_json
         ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 203, in _elements_json
    return self._convert_elements_to_dicts(self._elements_via_local)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 221, in _elements_via_local      
    return partition(
           ^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\auto.py", line 186, in partition
    file_type = detect_filetype(
                ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 100, in detect_filetype
    return _FileTypeDetector.file_type(ctx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 133, in file_type
    return cls(ctx)._file_type
           ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 143, in _file_type
    if file_type := self._file_type_from_guessed_mime_type:
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 183, in _file_type_from_guessed_mime_type
    mime_type = self._ctx.mime_type
                ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\utils.py", line 155, in __get__
    value = self._fget(obj)
            ^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 364, in mime_type
    import magic
  File "D:\mydatapro\venv_net\Lib\site-packages\magic\__init__.py", line 209, in <module>
    libmagic = loader.load_lib()
               ^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\magic\loader.py", line 49, in load_lib
    raise ImportError('failed to find libmagic.  Check your installation')
ImportError: failed to find libmagic.  Check your installation

按照网络上找的办法修改

还是报错:LookupError:Resource punkt not found.

bash 复制代码
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 22, in <module>
    split_data = main_embedding()
                 ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 11, in main_embedding
    data = loader.load()# 加载数据
           ^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_core\document_loaders\base.py", line 30, in load
    return list(self.lazy_load())
           ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 150, in lazy_load
    yield from load_file(f=self.file, f_path=self.file_path)
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 184, in lazy_load
    else self._elements_json
         ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 203, in _elements_json
    return self._convert_elements_to_dicts(self._elements_via_local)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_unstructured\document_loaders.py", line 221, in _elements_via_local      
    return partition(
           ^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\auto.py", line 415, in partition
    elements = partition_text(
               ^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\text.py", line 102, in partition_text
    return _partition_text(
           ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\documents\elements.py", line 605, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 706, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\file_utils\filetype.py", line 662, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\chunking\dispatch.py", line 74, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\text.py", line 181, in _partition_text
    file_content = _split_by_paragraph(
                   ^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\text.py", line 361, in _split_by_paragraph
    _split_content_to_fit_max(
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\partition\text.py", line 393, in _split_content_to_fit_max
    sentences = sent_tokenize(content)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\unstructured\nlp\tokenize.py", line 131, in sent_tokenize
    return _sent_tokenize(text)
           ^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\nltk\tokenize\__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\nltk\data.py", line 750, in load
    opened_resource = _open(resource_url)
                      ^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\nltk\data.py", line 876, in _open
    return find(path_, path + [""]).open()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\nltk\data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/english.pickle

  Searched in:
    - 'C:\\Users\\shuhu/nltk_data'
    - 'D:\\mydatapro\\venv_net\\nltk_data'
    - 'D:\\mydatapro\\venv_net\\share\\nltk_data'
    - 'D:\\mydatapro\\venv_net\\lib\\nltk_data'
    - 'C:\\Users\\shuhu\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - ''
**********************************************************************

下载nltk_data

  • 网络一直不太稳定下载了很久,还设置了环境变量。

又报错:AttributeError: 'tuple' object has no attribute 'page_content'

  • 这个函数可不是我写的,这个是官方文件里面的。
bash 复制代码
D:\mydatapro\venv_net\Lib\site-packages\langchain_core\_api\deprecation.py:141: LangChainDeprecationWarning: The class `HuggingFaceEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 0.3.0. An updated version of the class exists in the langchain-huggingface package and should be used instead. To use it run `pip install -U langchain-huggingface` and import as `from langchain_huggingface import HuggingFaceEmbeddings`.
  warn_deprecated(
INFO: Use pytorch device_name: cpu
INFO: Load pretrained SentenceTransformer: F:\\moka-ai_m3e-base
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
INFO: Use pytorch device_name: cpu
INFO: Load pretrained SentenceTransformer: F:\\moka-ai_m3e-base
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
INFO: Load pretrained SentenceTransformer: F:\\moka-ai_m3e-base
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
Traceback (most recent call last):
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 24, in <module>
    split_data = main_embedding()
    split_data = main_embedding()
                 ^^^^^^^^^^^^^^^^
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 18, in main_embedding
  File "D:\mydatapro\myweb\AutoTokenizer.py", line 18, in main_embedding
    db = FAISS.from_documents(embeddings,split_data)# 构建向量库
    db = FAISS.from_documents(embeddings,split_data)# 构建向量库
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_core\vectorstores\base.py", line 831, in from_documents
    texts = [d.page_content for d in documents]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mydatapro\venv_net\Lib\site-packages\langchain_core\vectorstores\base.py", line 831, in <listcomp>
    texts = [d.page_content for d in documents]
             ^^^^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'page_content'
  • 不知道为什么这个报错信息很多重复的,难道是因为网络?不太懂。

怀疑是头文件的问题,修改头文件

我只改了头文件,所以我只能把错误原因归为网络问题or库有问题,这个是我最后用的所有头文件:

python 复制代码
from langchain_unstructured import UnstructuredLoader# 加载文档
from langchain_text_splitters import RecursiveCharacterTextSplitter# 切分文档
from langchain_huggingface import HuggingFaceEmbeddings# 向量化
from langchain_community.vectorstores import FAISS# 向量库 

终成功!

bash 复制代码
(venv_net) PS D:\mydatapro\myweb> python AutoTokenizer.py
INFO: Use pytorch device_name: cpu
INFO: Load pretrained SentenceTransformer: F:\\moka-ai_m3e-base
INFO: Loading faiss with AVX2 support.
INFO: Successfully loaded faiss with AVX2 support.
[Document(metadata={'source': './dataset/test.txt', 'file_directory': './dataset', 'filename': 'test.txt', 'last_modified': '2024-08-16T16:11:37', 'languages': ['zho'], 'filetype': 'text/plain', 'category': 'Title', 'element_id': '2ec66fdb03bd40ec722fd30005d3739a'}, page_content='国家建立的负责收集和保存本国出版物,担负国家总书库职能的图书馆。'), Document(metadata={'source': './dataset/test.txt', 'file_directory': './dataset', 'filename': 'test.txt', 'last_modified': '2024-08-16T16:11:37', 'languages': ['zho'], 'filetype': 'text/plain', 'category': 'Title', 'element_id': '39a938c715ce1a4b38af2b878c2d29d4'}, page_content='国家图书馆一般除收藏本国出版物外,还收藏大量外文出版物 (包括有关本国的外文书刊), 并负责编制国家书目和联合目录。'), Document(metadata={'source': './dataset/test.txt', 'file_directory': './dataset', 'filename': 'test.txt', 'last_modified': '2024-08-16T16:11:37', 'languages': ['zho'], 'filetype': 'text/plain', 'category': 'Title', 'element_id': '2ddfef3787246755bfd1955ef3eacb54'}, page_content='国家图书馆是一个国家 图书事业的推动者,是面向全国的中心图书馆,既是全国的藏书中心、馆际互借中心、国际书刊交换中心,'), Document(metadata={'source': './dataset/test.txt', 'file_directory': './dataset', 'filename': 'test.txt', 'last_modified': '2024-08-16T16:11:37', 'languages': ['zho'], 'filetype': 'text/plain', 'category': 'Title', 'element_id': 'ca80db5e9d73b32e59eb3dc122b274c6'}, page_content='也是全国的书目 和图书馆学研究的中心。')]
相关推荐
CSXB998 分钟前
三十四、Python基础语法(文件操作-上)
开发语言·python·功能测试·测试工具
weyson40 分钟前
CSharp OpenAI
人工智能·语言模型·chatgpt·openai
亚图跨际1 小时前
MATLAB和Python及R潜变量模型和降维
python·matlab·r语言·生物学·潜变量模型
IT古董1 小时前
【机器学习】决定系数(R²:Coefficient of Determination)
人工智能·python·机器学习
德育处主任Pro1 小时前
『Django』APIView基于类的用法
后端·python·django
Star Patrick1 小时前
算法训练(leetcode)二刷第十九天 | *39. 组合总和、*40. 组合总和 II、*131. 分割回文串
python·算法·leetcode
武子康2 小时前
大数据-213 数据挖掘 机器学习理论 - KMeans Python 实现 距离计算函数 质心函数 聚类函数
大数据·人工智能·python·机器学习·数据挖掘·scikit-learn·kmeans
写点什么啦2 小时前
使用R语言survminer获取生存分析高风险和低风险的最佳截断值cut-off
开发语言·python·r语言·生存分析·x-tile
武子康2 小时前
大数据-214 数据挖掘 机器学习理论 - KMeans Python 实现 算法验证 sklearn n_clusters labels
大数据·人工智能·python·深度学习·算法·机器学习·数据挖掘
封步宇AIGC3 小时前
量化交易系统开发-实时行情自动化交易-Okex K线数据
人工智能·python·机器学习·数据挖掘