大模型综述论文笔记6-15

这里写自定义目录标题

  • Keywords
  • [Backgroud for LLMs](#Backgroud for LLMs)
    • [Technical Evolution of GPT-series Models](#Technical Evolution of GPT-series Models)
      • [Research of OpenAI on LLMs can be roughly divided into the following stages](#Research of OpenAI on LLMs can be roughly divided into the following stages)
        • [Early Explorations](#Early Explorations)
        • [Capacity Leap](#Capacity Leap)
        • [Capacity Enhancement](#Capacity Enhancement)
        • [The Milestones of Language Models](#The Milestones of Language Models)
  • Resources
  • Pre-training
    • [Data Collection](#Data Collection)
    • [Data Preprocessing](#Data Preprocessing)

Keywords

GPT:Generative Pre-Training

Backgroud for LLMs

Technical Evolution of GPT-series Models

Two key points to GPT's success are (I) training decoder-onlly Transformer language models that can accurately predict the next word and (II) scaling up the size of language models

Research of OpenAI on LLMs can be roughly divided into the following stages

Early Explorations

Capacity Leap

ICT

Capacity Enhancement

1.training on code data

Codex: a GPT model fine-tuned on a large corpus of GitHub

code
2.alignment with human preference

reinforcement learning from human feedback (RLHF) algorithm

Note that it seems that the wording of "instruction tuning" has seldom

been used in OpenAI's paper and documentation, which is substituted by

supervised fine-tuning on human demonstrations (i.e., the first step

of the RLHF algorithm).

The Milestones of Language Models

chatGPT(based on gpt3.5 and gpt4) and GPT-4(multimodal)

Resources

Stanford Alpaca is the first open instruct-following model fine-tuned based on LLaMA (7B).

Alpaca LoRA (a reproduction of Stanford Alpaca using LoRA)

model 、data、library

Pre-training

Data Collection

General Text Data:webpages, books, and conversational text

Specialized Text Data:Multilingual text, Scientific text, Code

Data Preprocessing

Quality Filtering

  1. The former approach trains a selection classifier based on highquality texts and leverages it to identify and filter out low quality data.
  2. heuristic based approaches to eliminate low-quality texts through a set of well-designed rules: Language based filtering, Metric based filtering, Statistic based filtering, Keyword based filtering

De-duplication

Existing work has found that duplicate data in a corpus would reduce the diversity of language models, which may cause the training process to become unstable and thus affect the model performance.

  1. Privacy Redaction: (PII:personally identifiable information )
  2. Tokenization:(It aims to segment raw text into sequences of individual tokens, which are subsequently used as the inputs of LLMs.) Byte-Pair Encoding (BPE) tokenization; WordPiece tokenization; WordPiece tokenization
相关推荐
王上上6 小时前
【论文阅读】-周总结-第5周
论文阅读
一点.点6 小时前
VLM-E2E:通过多模态驾驶员注意融合增强端到端自动驾驶——论文阅读
论文阅读·大模型·自动驾驶·端到端
CV-杨帆8 小时前
论文阅读:2024 arxiv HybridFlow: A Flexible and Efficient RLHF Framework
论文阅读
Venus-ww8 小时前
Universal Value Function Approximators 论文阅读(强化学习,迁移?)
论文阅读
CV-杨帆9 小时前
论文阅读:2024 NeurIPS Group Robust Preference Optimization in Reward-free RLHF
论文阅读
xieyan08119 小时前
论文阅读_Search-R1_大模型+搜索引擎
论文阅读
崔高杰12 小时前
On the Biology of a Large Language Model——Claude团队的模型理解文章【论文阅读笔记】其二——数学计算部分
论文阅读·人工智能·笔记·语言模型·nlp
墨绿色的摆渡人13 小时前
论文笔记(八十)π0.5: a Vision-Language-Action Model with Open-World Generalization
论文阅读
xieyan081113 小时前
论文阅读_Citrus_在医学语言模型中利用专家认知路径以支持高级医疗决策
论文阅读
墨绿色的摆渡人13 小时前
论文笔记(七十九)STOMP: Stochastic Trajectory Optimization for Motion Planning
论文阅读