GPT
Transformer的
解码器
,仅已知"过去",推导"未来"
论文地址:Improving Language Understanding by Generative Pre-Training
半监督学习:无标签数据集预训练模型,有标签数据集 微调
BERT
Transformer的
编码器
,完形填空,已知 "过去" 和 "未来",推导中间值
论文地址:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
GPT-2
zero-ont
- 探索 模型泛化能力;兼容下游任务
的无障碍使用;
论文地址:Language Models are Unsupervised Multitask Learners
GPT-3
引入
prompt
,提升GPT-2
的有效性;
论文地址:language models are few-shot learners
写在最后:若本文章对您有帮助,请点个赞啦 ٩(๑•̀ω•́๑)۶