GPT演变:从GPT到ChatGPT

Transformer

论文

Attention Is All You Need The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder... https://arxiv.org/abs/1706.03762

The Illustrated Transformer

The Illustrated Transformer Discussions: Hacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments) Translations: Arabic, Chinese (Simplified) 1, Chinese (Simplified) 2, French 1, French 2, Japanese, https://jalammar.github.io/illustrated-transformer/

The Annotated Transformer

The Annotated Transformer (harvard.edu)

GPT Series

GPT-1: Improving Language Understanding by Generative Pre-Training

预训练+微调

Abstract: We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fifine-tuning on each specifific task.

1. Unsupervised pre-training

2. Supervised fine-tuning

(left) Transformer architecture and training objectives used in this work. (right) Input transformations for fifine-tuning on different tasks. We convert all structured inputs into token sequences to be processed by our pre-trained model, followed by a linear+softmax layer.

GPT2: Language Models are Unsupervised Multitask Learners

We demonstrate language models can perform down-stream tasks in a zero-shot setting -- without any parameter or architecture modification.

主要的变化:训练集WebText

GPT3: Language Models are Few-Shot Learners

Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art finetuning approaches. Specififically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting.

Zero-shot, one-shot and few-shot, contrasted with traditional fine-tuning. The panels above show four methods for performing a task with a language model -- fine-tuning is the traditional method, whereas zero-, one-, and few-shot, which we study in this work, require the model to perform the task with only forward passes at test time. We typically present the model with a few dozen examples in the few shot setting.

训练集:

NLP中迁移学习方式的演变

  1. word2vec (embedding): word vectors were learned and used as inputs to task-specifific architectures
  2. the contextual representations of recurrent networks were transferred (still applied to task-specifific architectures)
  3. pre-trained recurrent or transformer language models have been directly fine-tuned, entirely removing the need for task-specific architectures

预训练+微调方法的限制:为了在特定任务上获得更好的效果,需要在特定于该任务、有成千上万到数十万个样本的数据集上进行微调

ChatGPT

Reinforcement Learning from Human Feedback (RLHF)

参考:ChatGPT 背后的"功臣"------RLHF 技术详解 (huggingface.co)

RLHF的思想:以强化学习方式依据人类反馈优化语言模型。RLHF使得在一般文本数据语料库上训练的语言模型能和复杂的人类价值观对齐。

RLHF是一项涉及多个模型和不同训练阶段的复杂概念,这里我们按三个步骤分解:

  1. 预训练一个语言模型 (LM) ;

  2. 聚合问答数据并训练一个奖励模型 (Reward Model,RM) ;

    RM的训练是RLHF区别于旧范式的开端。这一模型接收一系列文本并返回一个标量奖励,数值上对应人的偏好。我们可以用端到端的方式用LM建模,或者用模块化的系统建模 (比如对输出进行排名,再将排名转换为奖励) 。这一奖励数值将对后续无缝接入现有的RL算法至关重要。

  3. 用强化学习 (RL) 方式微调 LM。

相关推荐
一水鉴天15 分钟前
为AI聊天工具添加一个知识系统 之65 详细设计 之6 变形机器人及伺服跟随
人工智能
井底哇哇6 小时前
ChatGPT是强人工智能吗?
人工智能·chatgpt
Coovally AI模型快速验证6 小时前
MMYOLO:打破单一模式限制,多模态目标检测的革命性突破!
人工智能·算法·yolo·目标检测·机器学习·计算机视觉·目标跟踪
AI浩7 小时前
【面试总结】FFN(前馈神经网络)在Transformer模型中先升维再降维的原因
人工智能·深度学习·计算机视觉·transformer
可为测控7 小时前
图像处理基础(4):高斯滤波器详解
人工智能·算法·计算机视觉
一水鉴天7 小时前
为AI聊天工具添加一个知识系统 之63 详细设计 之4:AI操作系统 之2 智能合约
开发语言·人工智能·python
倔强的石头1068 小时前
解锁辅助驾驶新境界:基于昇腾 AI 异构计算架构 CANN 的应用探秘
人工智能·架构
佛州小李哥8 小时前
Agent群舞,在亚马逊云科技搭建数字营销多代理(Multi-Agent)(下篇)
人工智能·科技·ai·语言模型·云计算·aws·亚马逊云科技
说私域9 小时前
社群裂变+2+1链动新纪元:S2B2C小程序如何重塑企业客户管理版图?
大数据·人工智能·小程序·开源
程序猿阿伟9 小时前
《探秘鸿蒙Next:如何保障AI模型轻量化后多设备协同功能一致》
人工智能·华为·harmonyos