本地读取Hugging Face中的预训练模型

本地读取Hugging Face中的预训练模型

前言

相关介绍

Hugging Face是一个在自然语言处理(NLP)领域具有显著影响力的开源机器学习模型库和平台。以下是对Hugging Face的详细介绍,包括其基本原理、优缺点、出处等:

一、简介

Hugging Face的原理基于深度学习和自然语言处理技术,其核心组件是Transformer架构。Transformer是一种强大的模型结构,能够处理大量的并行数据,并通过多层的自注意力机制和复杂的编码解码流程,实现对文本的高效理解和生成。由于其优秀的性能和可扩展性,Transformer已经成为许多先进NLP模型的基础。

Hugging Face平台提供了各种预训练的NLP模型,这些模型基于Transformer架构,具备强大的泛化能力和高度灵活性,使得开发者能够轻松应对各种复杂的NLP任务。用户可以根据自己的需求选择合适的模型进行微调或直接使用,通常涉及到对模型参数的调整和优化,以适应特定的任务需求。

二、优缺点

优点
  1. 丰富的模型库:Hugging Face拥有庞大的模型库,涵盖了众多经典的NLP模型,如BERT、GPT等。这些模型经过大规模语料库的预训练,具备出色的性能表现。
  2. 易用性:Hugging Face提供了简洁明了的API接口和详细的文档支持,使得开发者能够快速上手并灵活运用各项功能。此外,它还支持多种编程语言,满足不同开发者的需求。
  3. 社区支持:Hugging Face拥有庞大的开发者社区,这意味着当用户遇到问题时,可以迅速获得来自全球各地同行的帮助与解答。
  4. 全面的工具集:Hugging Face提供了包括Pipeline、AutoClass、数据集、模型工具和评估工具等在内的全面工具集,这些工具不仅简化了NLP任务的实现,还促进了社区的协作和知识共享。
缺点

尽管Hugging Face在自然语言处理领域具有显著优势,但也存在一些潜在的缺点或挑战:

  1. 计算资源需求高:由于Hugging Face提供的模型通常较为复杂且规模较大,因此在训练和推理过程中需要较高的计算资源。这可能对某些资源有限的用户或机构构成挑战。
  2. 数据隐私和安全性问题:在使用Hugging Face平台时,用户需要上传和处理大量的数据。这可能会引发数据隐私和安全性问题,尤其是在处理敏感或机密信息时。
  3. 模型选择和调优难度:虽然Hugging Face提供了丰富的模型和工具,但用户仍然需要具备一定的专业知识和经验来选择和调优最适合自己任务的模型。这可能对初学者构成一定的挑战。

三、出处与背景

Hugging Face是由法国连续创业者Clément Delangue(曾创办多个成功项目)和Thomas Wolf、Julien Chaumond一起创办的,于2016年成立,总部设在美国纽约。其中的两位创始人Clément Delangue和Thomas Wolf都是自然语言处理领域的专家。他们创办Hugging Face的初衷是为年轻人带来一个"娱乐型"的"开放领域聊天机器人",可以跟人聊天气、朋友、爱情和体育比赛等各种话题。也正因如此,Hugging Face的名字来源于一个张开双手的可爱笑脸emoji。

随着时间的推移,Hugging Face不断扩展其产品线,推出了Transformers库等核心产品,并迅速成为NLP领域最受欢迎的库之一。目前,Hugging Face已经成为自然语言处理领域中不可或缺的一部分,为众多企业和研究者提供了强大的技术支持和解决方案。

值得注意的是,虽然Hugging Face本身并没有直接发表关于其平台或架构的专门论文,但其背后的Transformer架构和相关模型(如BERT、GPT等)的论文已经在学术界和工业界产生了广泛的影响。这些论文详细阐述了Transformer架构的基本原理、模型设计和实验结果,为Hugging Face平台的发展提供了坚实的理论基础。

综上所述,Hugging Face是一个在自然语言处理领域具有显著影响力的开源机器学习模型库和平台。它提供了丰富的模型、工具和支持,使得开发者能够轻松应对各种复杂的NLP任务。然而,用户在使用时也需要注意计算资源需求、数据隐私和安全性以及模型选择和调优等问题。

前提条件

实验环境

bash 复制代码
Package                       Version
----------------------------- ------------
matplotlib                    3.3.4
numpy                         1.19.5
Pillow                        8.4.0
pip                           21.2.2
protobuf                      3.19.6
requests                      2.27.1
scikit-learn                  0.24.2
scipy                         1.5.4
sentencepiece                 0.1.91
setuptools                    58.0.4
threadpoolctl                 3.1.0
thulac                        0.2.2
tokenizers                    0.9.3
torch                         1.9.1+cu111
torchaudio                    0.9.1
torchvision                   0.10.1+cu111
tornado                       6.1
tqdm                          4.64.1
traitlets                     4.3.3
transformers                  3.5.1
urllib3                       1.26.20

本地读取Hugging Face中的bert-base-chinese预训练模型

下载相关模型文件

  • config.json:模型配置文件
  • pytorch_model.bin:PyTorch模型参数文件
  • tokenizer.json、tokenizer_config.json、vocab.txt:Tokenizer文件

在线读取预训练模型

python 复制代码
from transformers import get_linear_schedule_with_warmup, AdamW
from transformers import BertTokenizer, BertForSequenceClassification

tokenizer = BertTokenizer.from_pretrained('bert-base-chinese') # 在线读取,可能会报网络错误
model = BertForSequenceClassification.from_pretrained('bert-base-chinese') # 在线读取预训练模型

在线读取预训练模型,可能会报错如下。

bash 复制代码
{
	"name": "ValueError",
	"message": "Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-85b5904e99d1> in <module>
----> 1 tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
      2 class MyDataSet(torch.utils.data.Dataset):
      3     def __init__(self, examples):
      4         self.examples = examples
      5 

d:\\anaconda3\\envs\
lp\\lib\\site-packages\\transformers\\tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1627                         proxies=proxies,
   1628                         resume_download=resume_download,
-> 1629                         local_files_only=local_files_only,
   1630                     )
   1631                 except requests.exceptions.HTTPError as err:

d:\\anaconda3\\envs\
lp\\lib\\site-packages\\transformers\\file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, local_files_only)
    953             resume_download=resume_download,
    954             user_agent=user_agent,
--> 955             local_files_only=local_files_only,
    956         )
    957     elif os.path.exists(url_or_filename):

d:\\anaconda3\\envs\
lp\\lib\\site-packages\\transformers\\file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, local_files_only)
   1123                 else:
   1124                     raise ValueError(
-> 1125                         \"Connection error, and we cannot find the requested files in the cached path.\"
   1126                         \" Please try again or make sure your Internet connection is on.\"
   1127                     )

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on."
}

因此,本文使用下载好的文件,通过本地读取Hugging Face中的预训练模型,主要是为了有效解决在线读取网络连不上等问题

本地读取预训练模型

本地目录结构如下:

bash 复制代码
bert-base-chinese/
    config.json
    pytorch_model.bin
    tokenizer.json
    tokenizer_config.json
    vocab.txt
python 复制代码
from transformers import get_linear_schedule_with_warmup, AdamW
from transformers import BertTokenizer, BertForSequenceClassification, BertConfig, BertModel

tokenizer = BertTokenizer.from_pretrained('bert-base-chinese/') # 注意此处为本地文件夹

# 本地读取预训练模型
config = BertConfig.from_json_file("bert-base-chinese/config.json")
model = BertForSequenceClassification.from_pretrained("bert-base-chinese/pytorch_model.bin", config=config)
bash 复制代码
BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(21128, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (1): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (2): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (3): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (4): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (5): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (6): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (7): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (8): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (9): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (10): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (11): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
    )
    (pooler): BertPooler(
      (dense): Linear(in_features=768, out_features=768, bias=True)
      (activation): Tanh()
    )
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (classifier): Linear(in_features=768, out_features=2, bias=True)
)

参考文献

[1] https://huggingface.co/

[2] https://huggingface.co/google-bert/bert-base-chinese/tree/main

[3] https://huggingface.co/docs/transformers/model_doc/bert

[4] https://blog.csdn.net/ewen_lee/article/details/130992494

相关推荐
冷眼看人间恩怨12 分钟前
【话题讨论】AI大模型重塑软件开发:定义、应用、优势与挑战
人工智能·ai编程·软件开发
2401_8830410814 分钟前
新锐品牌电商代运营公司都有哪些?
大数据·人工智能
AI极客菌1 小时前
Controlnet作者新作IC-light V2:基于FLUX训练,支持处理风格化图像,细节远高于SD1.5。
人工智能·计算机视觉·ai作画·stable diffusion·aigc·flux·人工智能作画
阿_旭1 小时前
一文读懂| 自注意力与交叉注意力机制在计算机视觉中作用与基本原理
人工智能·深度学习·计算机视觉·cross-attention·self-attention
王哈哈^_^1 小时前
【数据集】【YOLO】【目标检测】交通事故识别数据集 8939 张,YOLO道路事故目标检测实战训练教程!
前端·人工智能·深度学习·yolo·目标检测·计算机视觉·pyqt
Power20246662 小时前
NLP论文速读|LongReward:基于AI反馈来提升长上下文大语言模型
人工智能·深度学习·机器学习·自然语言处理·nlp
数据猎手小k2 小时前
AIDOVECL数据集:包含超过15000张AI生成的车辆图像数据集,目的解决旨在解决眼水平分类和定位问题。
人工智能·分类·数据挖掘
好奇龙猫2 小时前
【学习AI-相关路程-mnist手写数字分类-win-硬件:windows-自我学习AI-实验步骤-全连接神经网络(BPnetwork)-操作流程(3) 】
人工智能·算法
沉下心来学鲁班3 小时前
复现LLM:带你从零认识语言模型
人工智能·语言模型
数据猎手小k3 小时前
AndroidLab:一个系统化的Android代理框架,包含操作环境和可复现的基准测试,支持大型语言模型和多模态模型。
android·人工智能·机器学习·语言模型