本地读取Hugging Face中的预训练模型

本地读取Hugging Face中的预训练模型

前言

相关介绍

Hugging Face是一个在自然语言处理(NLP)领域具有显著影响力的开源机器学习模型库和平台。以下是对Hugging Face的详细介绍,包括其基本原理、优缺点、出处等:

一、简介

Hugging Face的原理基于深度学习和自然语言处理技术,其核心组件是Transformer架构。Transformer是一种强大的模型结构,能够处理大量的并行数据,并通过多层的自注意力机制和复杂的编码解码流程,实现对文本的高效理解和生成。由于其优秀的性能和可扩展性,Transformer已经成为许多先进NLP模型的基础。

Hugging Face平台提供了各种预训练的NLP模型,这些模型基于Transformer架构,具备强大的泛化能力和高度灵活性,使得开发者能够轻松应对各种复杂的NLP任务。用户可以根据自己的需求选择合适的模型进行微调或直接使用,通常涉及到对模型参数的调整和优化,以适应特定的任务需求。

二、优缺点

优点
  1. 丰富的模型库:Hugging Face拥有庞大的模型库,涵盖了众多经典的NLP模型,如BERT、GPT等。这些模型经过大规模语料库的预训练,具备出色的性能表现。
  2. 易用性:Hugging Face提供了简洁明了的API接口和详细的文档支持,使得开发者能够快速上手并灵活运用各项功能。此外,它还支持多种编程语言,满足不同开发者的需求。
  3. 社区支持:Hugging Face拥有庞大的开发者社区,这意味着当用户遇到问题时,可以迅速获得来自全球各地同行的帮助与解答。
  4. 全面的工具集:Hugging Face提供了包括Pipeline、AutoClass、数据集、模型工具和评估工具等在内的全面工具集,这些工具不仅简化了NLP任务的实现,还促进了社区的协作和知识共享。
缺点

尽管Hugging Face在自然语言处理领域具有显著优势,但也存在一些潜在的缺点或挑战:

  1. 计算资源需求高:由于Hugging Face提供的模型通常较为复杂且规模较大,因此在训练和推理过程中需要较高的计算资源。这可能对某些资源有限的用户或机构构成挑战。
  2. 数据隐私和安全性问题:在使用Hugging Face平台时,用户需要上传和处理大量的数据。这可能会引发数据隐私和安全性问题,尤其是在处理敏感或机密信息时。
  3. 模型选择和调优难度:虽然Hugging Face提供了丰富的模型和工具,但用户仍然需要具备一定的专业知识和经验来选择和调优最适合自己任务的模型。这可能对初学者构成一定的挑战。

三、出处与背景

Hugging Face是由法国连续创业者Clément Delangue(曾创办多个成功项目)和Thomas Wolf、Julien Chaumond一起创办的,于2016年成立,总部设在美国纽约。其中的两位创始人Clément Delangue和Thomas Wolf都是自然语言处理领域的专家。他们创办Hugging Face的初衷是为年轻人带来一个"娱乐型"的"开放领域聊天机器人",可以跟人聊天气、朋友、爱情和体育比赛等各种话题。也正因如此,Hugging Face的名字来源于一个张开双手的可爱笑脸emoji。

随着时间的推移,Hugging Face不断扩展其产品线,推出了Transformers库等核心产品,并迅速成为NLP领域最受欢迎的库之一。目前,Hugging Face已经成为自然语言处理领域中不可或缺的一部分,为众多企业和研究者提供了强大的技术支持和解决方案。

值得注意的是,虽然Hugging Face本身并没有直接发表关于其平台或架构的专门论文,但其背后的Transformer架构和相关模型(如BERT、GPT等)的论文已经在学术界和工业界产生了广泛的影响。这些论文详细阐述了Transformer架构的基本原理、模型设计和实验结果,为Hugging Face平台的发展提供了坚实的理论基础。

综上所述,Hugging Face是一个在自然语言处理领域具有显著影响力的开源机器学习模型库和平台。它提供了丰富的模型、工具和支持,使得开发者能够轻松应对各种复杂的NLP任务。然而,用户在使用时也需要注意计算资源需求、数据隐私和安全性以及模型选择和调优等问题。

前提条件

实验环境

bash 复制代码
Package                       Version
----------------------------- ------------
matplotlib                    3.3.4
numpy                         1.19.5
Pillow                        8.4.0
pip                           21.2.2
protobuf                      3.19.6
requests                      2.27.1
scikit-learn                  0.24.2
scipy                         1.5.4
sentencepiece                 0.1.91
setuptools                    58.0.4
threadpoolctl                 3.1.0
thulac                        0.2.2
tokenizers                    0.9.3
torch                         1.9.1+cu111
torchaudio                    0.9.1
torchvision                   0.10.1+cu111
tornado                       6.1
tqdm                          4.64.1
traitlets                     4.3.3
transformers                  3.5.1
urllib3                       1.26.20

本地读取Hugging Face中的bert-base-chinese预训练模型

下载相关模型文件

  • config.json:模型配置文件
  • pytorch_model.bin:PyTorch模型参数文件
  • tokenizer.json、tokenizer_config.json、vocab.txt:Tokenizer文件

在线读取预训练模型

python 复制代码
from transformers import get_linear_schedule_with_warmup, AdamW
from transformers import BertTokenizer, BertForSequenceClassification

tokenizer = BertTokenizer.from_pretrained('bert-base-chinese') # 在线读取,可能会报网络错误
model = BertForSequenceClassification.from_pretrained('bert-base-chinese') # 在线读取预训练模型

在线读取预训练模型,可能会报错如下。

bash 复制代码
{
	"name": "ValueError",
	"message": "Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-85b5904e99d1> in <module>
----> 1 tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
      2 class MyDataSet(torch.utils.data.Dataset):
      3     def __init__(self, examples):
      4         self.examples = examples
      5 

d:\\anaconda3\\envs\
lp\\lib\\site-packages\\transformers\\tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1627                         proxies=proxies,
   1628                         resume_download=resume_download,
-> 1629                         local_files_only=local_files_only,
   1630                     )
   1631                 except requests.exceptions.HTTPError as err:

d:\\anaconda3\\envs\
lp\\lib\\site-packages\\transformers\\file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, local_files_only)
    953             resume_download=resume_download,
    954             user_agent=user_agent,
--> 955             local_files_only=local_files_only,
    956         )
    957     elif os.path.exists(url_or_filename):

d:\\anaconda3\\envs\
lp\\lib\\site-packages\\transformers\\file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, local_files_only)
   1123                 else:
   1124                     raise ValueError(
-> 1125                         \"Connection error, and we cannot find the requested files in the cached path.\"
   1126                         \" Please try again or make sure your Internet connection is on.\"
   1127                     )

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on."
}

因此,本文使用下载好的文件,通过本地读取Hugging Face中的预训练模型,主要是为了有效解决在线读取网络连不上等问题

本地读取预训练模型

本地目录结构如下:

bash 复制代码
bert-base-chinese/
    config.json
    pytorch_model.bin
    tokenizer.json
    tokenizer_config.json
    vocab.txt
python 复制代码
from transformers import get_linear_schedule_with_warmup, AdamW
from transformers import BertTokenizer, BertForSequenceClassification, BertConfig, BertModel

tokenizer = BertTokenizer.from_pretrained('bert-base-chinese/') # 注意此处为本地文件夹

# 本地读取预训练模型
config = BertConfig.from_json_file("bert-base-chinese/config.json")
model = BertForSequenceClassification.from_pretrained("bert-base-chinese/pytorch_model.bin", config=config)
bash 复制代码
BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(21128, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (1): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (2): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (3): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (4): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (5): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (6): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (7): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (8): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (9): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (10): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (11): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
    )
    (pooler): BertPooler(
      (dense): Linear(in_features=768, out_features=768, bias=True)
      (activation): Tanh()
    )
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (classifier): Linear(in_features=768, out_features=2, bias=True)
)

参考文献

1\] https://huggingface.co/ \[2\] https://huggingface.co/google-bert/bert-base-chinese/tree/main \[3\] https://huggingface.co/docs/transformers/model_doc/bert \[4\] https://blog.csdn.net/ewen_lee/article/details/130992494 > * 由于本人水平有限,难免出现错漏,敬请批评改正。 > * 更多精彩内容,可点击进入[Python日常小操作](https://blog.csdn.net/friendshiptang/category_11653584.html)专栏、[OpenCV-Python小应用](https://blog.csdn.net/friendshiptang/category_11975851.html)专栏、[YOLO系列](https://blog.csdn.net/friendshiptang/category_12168736.html)专栏、[自然语言处理](https://blog.csdn.net/friendshiptang/category_12396029.html)专栏或我的[个人主页](https://blog.csdn.net/FriendshipTang)查看 > * [基于DETR的人脸伪装检测](https://blog.csdn.net/FriendshipTang/article/details/131670277) > * [YOLOv7训练自己的数据集(口罩检测)](https://blog.csdn.net/FriendshipTang/article/details/126513426) > * [YOLOv8训练自己的数据集(足球检测)](https://blog.csdn.net/FriendshipTang/article/details/129035180) > * [YOLOv10训练自己的数据集(交通标志检测)](https://blog.csdn.net/FriendshipTang/article/details/140362093) > * [YOLO11训练自己的数据集(吸烟、跌倒行为检测)](https://blog.csdn.net/FriendshipTang/article/details/142772199) > * [YOLOv5:TensorRT加速YOLOv5模型推理](https://blog.csdn.net/FriendshipTang/article/details/131023963) > * [YOLOv5:IoU、GIoU、DIoU、CIoU、EIoU](https://blog.csdn.net/FriendshipTang/article/details/129969044) > * [玩转Jetson Nano(五):TensorRT加速YOLOv5目标检测](https://blog.csdn.net/FriendshipTang/article/details/126696542) > * [YOLOv5:添加SE、CBAM、CoordAtt、ECA注意力机制](https://blog.csdn.net/FriendshipTang/article/details/130396540) > * [YOLOv5:yolov5s.yaml配置文件解读、增加小目标检测层](https://blog.csdn.net/FriendshipTang/article/details/130375883) > * [Python将COCO格式实例分割数据集转换为YOLO格式实例分割数据集](https://blog.csdn.net/FriendshipTang/article/details/131979248) > * [YOLOv5:使用7.0版本训练自己的实例分割模型(车辆、行人、路标、车道线等实例分割)](https://blog.csdn.net/FriendshipTang/article/details/131987249) > * [使用Kaggle GPU资源免费体验Stable Diffusion开源项目](https://blog.csdn.net/FriendshipTang/article/details/132238734)

相关推荐
大模型最新论文速读12 分钟前
在Text-to-SQL任务中应用过程奖励模型
数据库·人工智能·sql·深度学习·语言模型·自然语言处理
测试者家园16 分钟前
安装Python和配置开发环境
开发语言·软件测试·人工智能·python·职场和发展·零基础·质量效能
明明跟你说过27 分钟前
深入理解Embedding Models(嵌入模型):从原理到实战(下)
人工智能·语言模型·embedding
满怀10151 小时前
【人工智能核心技术全景解读】从机器学习到深度学习实战
人工智能·python·深度学习·机器学习·tensorflow
Blossom.1181 小时前
探索边缘计算:赋能物联网的未来
开发语言·人工智能·深度学习·opencv·物联网·机器学习·边缘计算
-曾牛1 小时前
Spring AI 与 Hugging Face 深度集成:打造高效文本生成应用
java·人工智能·后端·spring·搜索引擎·springai·deepseek
modest —YBW1 小时前
Ollama+OpenWebUI+docker完整版部署,附带软件下载链接,配置+中文汉化+docker源,适合内网部署,可以局域网使用
人工智能·windows·docker·语言模型·llama
迪捷软件1 小时前
从概念表达到安全验证:智能驾驶功能迎来系统性规范
大数据·人工智能
非凡ghost2 小时前
透视相机:创意摄影新体验,解锁照片无限可能
人工智能·数码相机
L_cl2 小时前
【NLP 71、常见大模型的模型结构对比】
自然语言处理