本地读取Hugging Face中的预训练模型

本地读取Hugging Face中的预训练模型

前言

相关介绍

Hugging Face是一个在自然语言处理(NLP)领域具有显著影响力的开源机器学习模型库和平台。以下是对Hugging Face的详细介绍,包括其基本原理、优缺点、出处等:

一、简介

Hugging Face的原理基于深度学习和自然语言处理技术,其核心组件是Transformer架构。Transformer是一种强大的模型结构,能够处理大量的并行数据,并通过多层的自注意力机制和复杂的编码解码流程,实现对文本的高效理解和生成。由于其优秀的性能和可扩展性,Transformer已经成为许多先进NLP模型的基础。

Hugging Face平台提供了各种预训练的NLP模型,这些模型基于Transformer架构,具备强大的泛化能力和高度灵活性,使得开发者能够轻松应对各种复杂的NLP任务。用户可以根据自己的需求选择合适的模型进行微调或直接使用,通常涉及到对模型参数的调整和优化,以适应特定的任务需求。

二、优缺点

优点
  1. 丰富的模型库:Hugging Face拥有庞大的模型库,涵盖了众多经典的NLP模型,如BERT、GPT等。这些模型经过大规模语料库的预训练,具备出色的性能表现。
  2. 易用性:Hugging Face提供了简洁明了的API接口和详细的文档支持,使得开发者能够快速上手并灵活运用各项功能。此外,它还支持多种编程语言,满足不同开发者的需求。
  3. 社区支持:Hugging Face拥有庞大的开发者社区,这意味着当用户遇到问题时,可以迅速获得来自全球各地同行的帮助与解答。
  4. 全面的工具集:Hugging Face提供了包括Pipeline、AutoClass、数据集、模型工具和评估工具等在内的全面工具集,这些工具不仅简化了NLP任务的实现,还促进了社区的协作和知识共享。
缺点

尽管Hugging Face在自然语言处理领域具有显著优势,但也存在一些潜在的缺点或挑战:

  1. 计算资源需求高:由于Hugging Face提供的模型通常较为复杂且规模较大,因此在训练和推理过程中需要较高的计算资源。这可能对某些资源有限的用户或机构构成挑战。
  2. 数据隐私和安全性问题:在使用Hugging Face平台时,用户需要上传和处理大量的数据。这可能会引发数据隐私和安全性问题,尤其是在处理敏感或机密信息时。
  3. 模型选择和调优难度:虽然Hugging Face提供了丰富的模型和工具,但用户仍然需要具备一定的专业知识和经验来选择和调优最适合自己任务的模型。这可能对初学者构成一定的挑战。

三、出处与背景

Hugging Face是由法国连续创业者Clément Delangue(曾创办多个成功项目)和Thomas Wolf、Julien Chaumond一起创办的,于2016年成立,总部设在美国纽约。其中的两位创始人Clément Delangue和Thomas Wolf都是自然语言处理领域的专家。他们创办Hugging Face的初衷是为年轻人带来一个"娱乐型"的"开放领域聊天机器人",可以跟人聊天气、朋友、爱情和体育比赛等各种话题。也正因如此,Hugging Face的名字来源于一个张开双手的可爱笑脸emoji。

随着时间的推移,Hugging Face不断扩展其产品线,推出了Transformers库等核心产品,并迅速成为NLP领域最受欢迎的库之一。目前,Hugging Face已经成为自然语言处理领域中不可或缺的一部分,为众多企业和研究者提供了强大的技术支持和解决方案。

值得注意的是,虽然Hugging Face本身并没有直接发表关于其平台或架构的专门论文,但其背后的Transformer架构和相关模型(如BERT、GPT等)的论文已经在学术界和工业界产生了广泛的影响。这些论文详细阐述了Transformer架构的基本原理、模型设计和实验结果,为Hugging Face平台的发展提供了坚实的理论基础。

综上所述,Hugging Face是一个在自然语言处理领域具有显著影响力的开源机器学习模型库和平台。它提供了丰富的模型、工具和支持,使得开发者能够轻松应对各种复杂的NLP任务。然而,用户在使用时也需要注意计算资源需求、数据隐私和安全性以及模型选择和调优等问题。

前提条件

实验环境

bash 复制代码
Package                       Version
----------------------------- ------------
matplotlib                    3.3.4
numpy                         1.19.5
Pillow                        8.4.0
pip                           21.2.2
protobuf                      3.19.6
requests                      2.27.1
scikit-learn                  0.24.2
scipy                         1.5.4
sentencepiece                 0.1.91
setuptools                    58.0.4
threadpoolctl                 3.1.0
thulac                        0.2.2
tokenizers                    0.9.3
torch                         1.9.1+cu111
torchaudio                    0.9.1
torchvision                   0.10.1+cu111
tornado                       6.1
tqdm                          4.64.1
traitlets                     4.3.3
transformers                  3.5.1
urllib3                       1.26.20

本地读取Hugging Face中的bert-base-chinese预训练模型

下载相关模型文件

  • config.json:模型配置文件
  • pytorch_model.bin:PyTorch模型参数文件
  • tokenizer.json、tokenizer_config.json、vocab.txt:Tokenizer文件

在线读取预训练模型

python 复制代码
from transformers import get_linear_schedule_with_warmup, AdamW
from transformers import BertTokenizer, BertForSequenceClassification

tokenizer = BertTokenizer.from_pretrained('bert-base-chinese') # 在线读取,可能会报网络错误
model = BertForSequenceClassification.from_pretrained('bert-base-chinese') # 在线读取预训练模型

在线读取预训练模型,可能会报错如下。

bash 复制代码
{
	"name": "ValueError",
	"message": "Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-85b5904e99d1> in <module>
----> 1 tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
      2 class MyDataSet(torch.utils.data.Dataset):
      3     def __init__(self, examples):
      4         self.examples = examples
      5 

d:\\anaconda3\\envs\
lp\\lib\\site-packages\\transformers\\tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1627                         proxies=proxies,
   1628                         resume_download=resume_download,
-> 1629                         local_files_only=local_files_only,
   1630                     )
   1631                 except requests.exceptions.HTTPError as err:

d:\\anaconda3\\envs\
lp\\lib\\site-packages\\transformers\\file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, local_files_only)
    953             resume_download=resume_download,
    954             user_agent=user_agent,
--> 955             local_files_only=local_files_only,
    956         )
    957     elif os.path.exists(url_or_filename):

d:\\anaconda3\\envs\
lp\\lib\\site-packages\\transformers\\file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, local_files_only)
   1123                 else:
   1124                     raise ValueError(
-> 1125                         \"Connection error, and we cannot find the requested files in the cached path.\"
   1126                         \" Please try again or make sure your Internet connection is on.\"
   1127                     )

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on."
}

因此,本文使用下载好的文件,通过本地读取Hugging Face中的预训练模型,主要是为了有效解决在线读取网络连不上等问题

本地读取预训练模型

本地目录结构如下:

bash 复制代码
bert-base-chinese/
    config.json
    pytorch_model.bin
    tokenizer.json
    tokenizer_config.json
    vocab.txt
python 复制代码
from transformers import get_linear_schedule_with_warmup, AdamW
from transformers import BertTokenizer, BertForSequenceClassification, BertConfig, BertModel

tokenizer = BertTokenizer.from_pretrained('bert-base-chinese/') # 注意此处为本地文件夹

# 本地读取预训练模型
config = BertConfig.from_json_file("bert-base-chinese/config.json")
model = BertForSequenceClassification.from_pretrained("bert-base-chinese/pytorch_model.bin", config=config)
bash 复制代码
BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(21128, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (1): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (2): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (3): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (4): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (5): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (6): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (7): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (8): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (9): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (10): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (11): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
    )
    (pooler): BertPooler(
      (dense): Linear(in_features=768, out_features=768, bias=True)
      (activation): Tanh()
    )
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (classifier): Linear(in_features=768, out_features=2, bias=True)
)

参考文献

[1] https://huggingface.co/

[2] https://huggingface.co/google-bert/bert-base-chinese/tree/main

[3] https://huggingface.co/docs/transformers/model_doc/bert

[4] https://blog.csdn.net/ewen_lee/article/details/130992494

相关推荐
余生H30 分钟前
transformer.js(三):底层架构及性能优化指南
javascript·深度学习·架构·transformer
果冻人工智能1 小时前
2025 年将颠覆商业的 8 大 AI 应用场景
人工智能·ai员工
代码不行的搬运工1 小时前
神经网络12-Time-Series Transformer (TST)模型
人工智能·神经网络·transformer
石小石Orz1 小时前
Three.js + AI:AI 算法生成 3D 萤火虫飞舞效果~
javascript·人工智能·算法
罗小罗同学1 小时前
医工交叉入门书籍分享:Transformer模型在机器学习领域的应用|个人观点·24-11-22
深度学习·机器学习·transformer
孤独且没人爱的纸鹤1 小时前
【深度学习】:从人工神经网络的基础原理到循环神经网络的先进技术,跨越智能算法的关键发展阶段及其未来趋势,探索技术进步与应用挑战
人工智能·python·深度学习·机器学习·ai
阿_旭1 小时前
TensorFlow构建CNN卷积神经网络模型的基本步骤:数据处理、模型构建、模型训练
人工智能·深度学习·cnn·tensorflow
羊小猪~~1 小时前
tensorflow案例7--数据增强与测试集, 训练集, 验证集的构建
人工智能·python·深度学习·机器学习·cnn·tensorflow·neo4j
极客代码1 小时前
【Python TensorFlow】进阶指南(续篇三)
开发语言·人工智能·python·深度学习·tensorflow
zhangfeng11331 小时前
pytorch 的交叉熵函数,多分类,二分类
人工智能·pytorch·分类