Hugging Face实战-系列教程3:AutoModelForSequenceClassification文本2分类

🚩🚩🚩Hugging Face 实战系列 总目录

有任何问题欢迎在下面留言
本篇文章的代码运行界面均在notebook中进行
本篇文章配套的代码资源已经上传

下篇内容:
Hugging Face实战-系列教程4:padding与attention_mask

​输出我们需要几个输出呢?比如说这个cls分类,我们做一个10分类,可以吗?对每一个词做10分类可以吗?预测下一个词是什么可以吗?是不是也可以!

在我们的NLP任务中,相比图像任务有分类有回归,NLP有回归这一说吗?我们要做的所有任务都是分类,就是把分类做到哪儿而已,不管做什么都是分类。

比如我们刚刚导入的两个英语句子,是对序列做情感分析,就是一个二分类,用序列做分类,你想导什么输出头,你就导入什么东西就可以了,简不简单?好简单是不是,上代码:

python 复制代码
from transformers import AutoModelForSequenceClassification
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)
print(outputs.logits.shape)

导入一个序列分类的包,还是选择checkpoint这个名字,选择分词器,导入模型,将模型打印一下:

DistilBertForSequenceClassification(

(distilbert): DistilBertModel(

(embeddings): Embeddings(

(word_embeddings): Embedding(30522, 768, padding_idx=0)

(position_embeddings): Embedding(512, 768)

(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

(dropout): Dropout(p=0.1, inplace=False)

)

(transformer): Transformer(

(layer): ModuleList(

(0): TransformerBlock(

(attention): MultiHeadSelfAttention(

(dropout): Dropout(p=0.1, inplace=False)

(q_lin): Linear(in_features=768, out_features=768, bias=True)

(k_lin): Linear(in_features=768, out_features=768, bias=True)

(v_lin): Linear(in_features=768, out_features=768, bias=True)

(out_lin): Linear(in_features=768, out_features=768, bias=True)

)

(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

(ffn): FFN(

(dropout): Dropout(p=0.1, inplace=False)

(lin1): Linear(in_features=768, out_features=3072, bias=True)

(lin2): Linear(in_features=3072, out_features=768, bias=True)

)

(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

)

(1): TransformerBlock(

(attention): MultiHeadSelfAttention(

(dropout): Dropout(p=0.1, inplace=False)

(q_lin): Linear(in_features=768, out_features=768, bias=True)

(k_lin): Linear(in_features=768, out_features=768, bias=True)

(v_lin): Linear(in_features=768, out_features=768, bias=True)

(out_lin): Linear(in_features=768, out_features=768, bias=True)

)

(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

(ffn): FFN(

(dropout): Dropout(p=0.1, inplace=False)

(lin1): Linear(in_features=768, out_features=3072, bias=True)

(lin2): Linear(in_features=3072, out_features=768, bias=True)

)

(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

)

(2): TransformerBlock(

(attention): MultiHeadSelfAttention(

(dropout): Dropout(p=0.1, inplace=False)

(q_lin): Linear(in_features=768, out_features=768, bias=True)

(k_lin): Linear(in_features=768, out_features=768, bias=True)

(v_lin): Linear(in_features=768, out_features=768, bias=True)

(out_lin): Linear(in_features=768, out_features=768, bias=True)

)

(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

(ffn): FFN(

(dropout): Dropout(p=0.1, inplace=False)

(lin1): Linear(in_features=768, out_features=3072, bias=True)

(lin2): Linear(in_features=3072, out_features=768, bias=True)

)

(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

)

(3): TransformerBlock(

(attention): MultiHeadSelfAttention(

(dropout): Dropout(p=0.1, inplace=False)

(q_lin): Linear(in_features=768, out_features=768, bias=True)

(k_lin): Linear(in_features=768, out_features=768, bias=True)

(v_lin): Linear(in_features=768, out_features=768, bias=True)

(out_lin): Linear(in_features=768, out_features=768, bias=True)

)

(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

(ffn): FFN(

(dropout): Dropout(p=0.1, inplace=False)

(lin1): Linear(in_features=768, out_features=3072, bias=True)

(lin2): Linear(in_features=3072, out_features=768, bias=True)

)

(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

)

(4): TransformerBlock(

(attention): MultiHeadSelfAttention(

(dropout): Dropout(p=0.1, inplace=False)

(q_lin): Linear(in_features=768, out_features=768, bias=True)

(k_lin): Linear(in_features=768, out_features=768, bias=True)

(v_lin): Linear(in_features=768, out_features=768, bias=True)

(out_lin): Linear(in_features=768, out_features=768, bias=True)

)

(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

(ffn): FFN(

(dropout): Dropout(p=0.1, inplace=False)

(lin1): Linear(in_features=768, out_features=3072, bias=True)

(lin2): Linear(in_features=3072, out_features=768, bias=True)

)

(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

)

(5): TransformerBlock(

(attention): MultiHeadSelfAttention(

(dropout): Dropout(p=0.1, inplace=False)

(q_lin): Linear(in_features=768, out_features=768, bias=True)

(k_lin): Linear(in_features=768, out_features=768, bias=True)

(v_lin): Linear(in_features=768, out_features=768, bias=True)

(out_lin): Linear(in_features=768, out_features=768, bias=True)

)

(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

(ffn): FFN(

(dropout): Dropout(p=0.1, inplace=False)

(lin1): Linear(in_features=768, out_features=3072, bias=True)

(lin2): Linear(in_features=3072, out_features=768, bias=True)

)

(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)

)

)

)

)

(pre_classifier): Linear(in_features=768, out_features=768, bias=True)

(classifier): Linear(in_features=768, out_features=2, bias=True)

(dropout): Dropout(p=0.2, inplace=False)

)

看看多了什么?前面我们说对每一个词生成一个768向量,最后就连了两个全连接层:

(pre_classifier): Linear(in_features=768, out_features=768, bias=True)

(classifier): Linear(in_features=768, out_features=2, bias=True)

(dropout): Dropout(p=0.2, inplace=False)

这个logits就是输出结果了:

print(outputs.logits.shape)

torch.Size(2, 2)

这个2*2表示的就是样本为2(两个英语句子),分类是2分类,但是我们需要得到最后的分类概率,再加上softmax:

python 复制代码
import torch
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)

dim=-1就是沿着最后一个维度进行计算,最后返回的就是概率值:

tensor(\[1.5446e-02, 9.8455e-01, 9.9946e-01, 5.4418e-04], grad_fn=SoftmaxBackward0)

概率知道了,类别的概率是什么呢?调一个内置的id to label配置:

python 复制代码
model.config.id2label
{0: 'NEGATIVE', 1: 'POSITIVE'}

也就是说,第一个句子负面情感的概率为1.54%,正面的概率情感为98.46%

下篇内容:
Hugging Face实战-系列教程4:padding与attention_mask

相关推荐
程序猿追5 天前
那个右下角的小数字怎么“卡”住我打字——我用 HarmonyOS 自己写了一个字数限制输入框
pytorch·华为·harmonyos
renhongxia15 天前
世界模型作为AGI落地底层底座的作用
人工智能·深度学习·生成对抗网络·自然语言处理·知识图谱·agi
闵孚龙6 天前
《PyTorch 深度修炼》Dataset 和 DataLoader:数据如何喂给模型
人工智能·pytorch·python
大模型最新论文速读6 天前
06-16 · LLM 最新论文速览
论文阅读·人工智能·深度学习·机器学习·自然语言处理
宝贝儿好6 天前
【LLM】第二章:HuggingFace入门学习
人工智能·深度学习·神经网络·学习·算法·自然语言处理
bryant_meng6 天前
【VAE】From Pixels to Faces: Building a VAE from Scratch
pytorch·vae·log-sigma2·重参数
小小工匠6 天前
拆解大语言模型:从词向量到注意力机制的内部运行原理
人工智能·语言模型·自然语言处理
星川皆无恙6 天前
大数据k-means聚类算法:基于k-means聚类算法+NLP微博舆情数据爬虫可视化分析推荐系统(新版)
大数据·人工智能·爬虫·算法·机器学习·自然语言处理·kmeans
财经资讯数据_灵砚智能6 天前
基于全球经济类多源新闻的NLP情感分析与数据可视化(夜间-次晨)2026年6月15日
大数据·人工智能·python·ai·信息可视化·自然语言处理·灵砚智能