本地读取Hugging Face中的预训练模型
- 前言
- 相关介绍
- 前提条件
- 实验环境
- [本地读取Hugging Face中的bert-base-chinese预训练模型](#本地读取Hugging Face中的bert-base-chinese预训练模型)
- 参考文献
前言
- 由于本人水平有限,难免出现错漏,敬请批评改正。
- 更多精彩内容,可点击进入Python日常小操作专栏、OpenCV-Python小应用专栏、YOLO系列专栏、自然语言处理专栏或我的个人主页查看
- 基于DETR的人脸伪装检测
- YOLOv7训练自己的数据集(口罩检测)
- YOLOv8训练自己的数据集(足球检测)
- YOLOv10训练自己的数据集(交通标志检测)
- YOLO11训练自己的数据集(吸烟、跌倒行为检测)
- YOLOv5:TensorRT加速YOLOv5模型推理
- YOLOv5:IoU、GIoU、DIoU、CIoU、EIoU
- 玩转Jetson Nano(五):TensorRT加速YOLOv5目标检测
- YOLOv5:添加SE、CBAM、CoordAtt、ECA注意力机制
- YOLOv5:yolov5s.yaml配置文件解读、增加小目标检测层
- Python将COCO格式实例分割数据集转换为YOLO格式实例分割数据集
- YOLOv5:使用7.0版本训练自己的实例分割模型(车辆、行人、路标、车道线等实例分割)
- 使用Kaggle GPU资源免费体验Stable Diffusion开源项目
相关介绍
Hugging Face是一个在自然语言处理(NLP)领域具有显著影响力的开源机器学习模型库和平台。以下是对Hugging Face的详细介绍,包括其基本原理、优缺点、出处等:
一、简介
Hugging Face的原理基于深度学习和自然语言处理技术,其核心组件是Transformer架构。Transformer是一种强大的模型结构,能够处理大量的并行数据,并通过多层的自注意力机制和复杂的编码解码流程,实现对文本的高效理解和生成。由于其优秀的性能和可扩展性,Transformer已经成为许多先进NLP模型的基础。
Hugging Face平台提供了各种预训练的NLP模型,这些模型基于Transformer架构,具备强大的泛化能力和高度灵活性,使得开发者能够轻松应对各种复杂的NLP任务。用户可以根据自己的需求选择合适的模型进行微调或直接使用,通常涉及到对模型参数的调整和优化,以适应特定的任务需求。
二、优缺点
优点
- 丰富的模型库:Hugging Face拥有庞大的模型库,涵盖了众多经典的NLP模型,如BERT、GPT等。这些模型经过大规模语料库的预训练,具备出色的性能表现。
- 易用性:Hugging Face提供了简洁明了的API接口和详细的文档支持,使得开发者能够快速上手并灵活运用各项功能。此外,它还支持多种编程语言,满足不同开发者的需求。
- 社区支持:Hugging Face拥有庞大的开发者社区,这意味着当用户遇到问题时,可以迅速获得来自全球各地同行的帮助与解答。
- 全面的工具集:Hugging Face提供了包括Pipeline、AutoClass、数据集、模型工具和评估工具等在内的全面工具集,这些工具不仅简化了NLP任务的实现,还促进了社区的协作和知识共享。
缺点
尽管Hugging Face在自然语言处理领域具有显著优势,但也存在一些潜在的缺点或挑战:
- 计算资源需求高:由于Hugging Face提供的模型通常较为复杂且规模较大,因此在训练和推理过程中需要较高的计算资源。这可能对某些资源有限的用户或机构构成挑战。
- 数据隐私和安全性问题:在使用Hugging Face平台时,用户需要上传和处理大量的数据。这可能会引发数据隐私和安全性问题,尤其是在处理敏感或机密信息时。
- 模型选择和调优难度:虽然Hugging Face提供了丰富的模型和工具,但用户仍然需要具备一定的专业知识和经验来选择和调优最适合自己任务的模型。这可能对初学者构成一定的挑战。
三、出处与背景
Hugging Face是由法国连续创业者Clément Delangue(曾创办多个成功项目)和Thomas Wolf、Julien Chaumond一起创办的,于2016年成立,总部设在美国纽约。其中的两位创始人Clément Delangue和Thomas Wolf都是自然语言处理领域的专家。他们创办Hugging Face的初衷是为年轻人带来一个"娱乐型"的"开放领域聊天机器人",可以跟人聊天气、朋友、爱情和体育比赛等各种话题。也正因如此,Hugging Face的名字来源于一个张开双手的可爱笑脸emoji。
随着时间的推移,Hugging Face不断扩展其产品线,推出了Transformers库等核心产品,并迅速成为NLP领域最受欢迎的库之一。目前,Hugging Face已经成为自然语言处理领域中不可或缺的一部分,为众多企业和研究者提供了强大的技术支持和解决方案。
值得注意的是,虽然Hugging Face本身并没有直接发表关于其平台或架构的专门论文,但其背后的Transformer架构和相关模型(如BERT、GPT等)的论文已经在学术界和工业界产生了广泛的影响。这些论文详细阐述了Transformer架构的基本原理、模型设计和实验结果,为Hugging Face平台的发展提供了坚实的理论基础。
综上所述,Hugging Face是一个在自然语言处理领域具有显著影响力的开源机器学习模型库和平台。它提供了丰富的模型、工具和支持,使得开发者能够轻松应对各种复杂的NLP任务。然而,用户在使用时也需要注意计算资源需求、数据隐私和安全性以及模型选择和调优等问题。
前提条件
- 熟悉Python
实验环境
bash
Package Version
----------------------------- ------------
matplotlib 3.3.4
numpy 1.19.5
Pillow 8.4.0
pip 21.2.2
protobuf 3.19.6
requests 2.27.1
scikit-learn 0.24.2
scipy 1.5.4
sentencepiece 0.1.91
setuptools 58.0.4
threadpoolctl 3.1.0
thulac 0.2.2
tokenizers 0.9.3
torch 1.9.1+cu111
torchaudio 0.9.1
torchvision 0.10.1+cu111
tornado 6.1
tqdm 4.64.1
traitlets 4.3.3
transformers 3.5.1
urllib3 1.26.20
本地读取Hugging Face中的bert-base-chinese预训练模型
下载相关模型文件
- config.json:模型配置文件
- pytorch_model.bin:PyTorch模型参数文件
- tokenizer.json、tokenizer_config.json、vocab.txt:Tokenizer文件
在线读取预训练模型
python
from transformers import get_linear_schedule_with_warmup, AdamW
from transformers import BertTokenizer, BertForSequenceClassification
tokenizer = BertTokenizer.from_pretrained('bert-base-chinese') # 在线读取,可能会报网络错误
model = BertForSequenceClassification.from_pretrained('bert-base-chinese') # 在线读取预训练模型
在线读取预训练模型,可能会报错如下。
bash
{
"name": "ValueError",
"message": "Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.",
"stack": "---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-85b5904e99d1> in <module>
----> 1 tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
2 class MyDataSet(torch.utils.data.Dataset):
3 def __init__(self, examples):
4 self.examples = examples
5
d:\\anaconda3\\envs\
lp\\lib\\site-packages\\transformers\\tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
1627 proxies=proxies,
1628 resume_download=resume_download,
-> 1629 local_files_only=local_files_only,
1630 )
1631 except requests.exceptions.HTTPError as err:
d:\\anaconda3\\envs\
lp\\lib\\site-packages\\transformers\\file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, local_files_only)
953 resume_download=resume_download,
954 user_agent=user_agent,
--> 955 local_files_only=local_files_only,
956 )
957 elif os.path.exists(url_or_filename):
d:\\anaconda3\\envs\
lp\\lib\\site-packages\\transformers\\file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, local_files_only)
1123 else:
1124 raise ValueError(
-> 1125 \"Connection error, and we cannot find the requested files in the cached path.\"
1126 \" Please try again or make sure your Internet connection is on.\"
1127 )
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on."
}
因此,本文使用下载好的文件,通过本地读取Hugging Face中的预训练模型,主要是为了有效解决在线读取网络连不上等问题
。
本地读取预训练模型
本地目录结构如下:
bash
bert-base-chinese/
config.json
pytorch_model.bin
tokenizer.json
tokenizer_config.json
vocab.txt
python
from transformers import get_linear_schedule_with_warmup, AdamW
from transformers import BertTokenizer, BertForSequenceClassification, BertConfig, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-chinese/') # 注意此处为本地文件夹
# 本地读取预训练模型
config = BertConfig.from_json_file("bert-base-chinese/config.json")
model = BertForSequenceClassification.from_pretrained("bert-base-chinese/pytorch_model.bin", config=config)
bash
BertForSequenceClassification(
(bert): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(21128, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(4): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(5): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(6): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(7): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(8): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(9): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(10): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(11): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
(dropout): Dropout(p=0.1, inplace=False)
(classifier): Linear(in_features=768, out_features=2, bias=True)
)
参考文献
[2] https://huggingface.co/google-bert/bert-base-chinese/tree/main
[3] https://huggingface.co/docs/transformers/model_doc/bert
[4] https://blog.csdn.net/ewen_lee/article/details/130992494
- 由于本人水平有限,难免出现错漏,敬请批评改正。
- 更多精彩内容,可点击进入Python日常小操作专栏、OpenCV-Python小应用专栏、YOLO系列专栏、自然语言处理专栏或我的个人主页查看
- 基于DETR的人脸伪装检测
- YOLOv7训练自己的数据集(口罩检测)
- YOLOv8训练自己的数据集(足球检测)
- YOLOv10训练自己的数据集(交通标志检测)
- YOLO11训练自己的数据集(吸烟、跌倒行为检测)
- YOLOv5:TensorRT加速YOLOv5模型推理
- YOLOv5:IoU、GIoU、DIoU、CIoU、EIoU
- 玩转Jetson Nano(五):TensorRT加速YOLOv5目标检测
- YOLOv5:添加SE、CBAM、CoordAtt、ECA注意力机制
- YOLOv5:yolov5s.yaml配置文件解读、增加小目标检测层
- Python将COCO格式实例分割数据集转换为YOLO格式实例分割数据集
- YOLOv5:使用7.0版本训练自己的实例分割模型(车辆、行人、路标、车道线等实例分割)
- 使用Kaggle GPU资源免费体验Stable Diffusion开源项目