大语言模型-大模型基础文献

大模型基础

1、Attention Is All You Needhttps://arxiv.org/abs/1706.03762

attention is all you need

2、Sequence to Sequence Learning with Neural Networks https://arxiv.org/abs/1409.3215

基于深度神经网络(DNN)的序列到序列学习方法

3、Neural Machine Translation by Jointly Learning to Align and Translate https://arxiv.org/abs/1409.0473

4、BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805

5、Scaling Laws for Neural Language Models https://arxiv.org/pdf/2001.08361.pdf

6、Emergent Abilities of Large Language Models https://openreview.net/pdf?id=yzkSU5zdwD

Emergent Abilities of Large Language Models

7、Training Compute-Optimal Large Language Models (ChinChilla scaling law) https://arxiv.org/abs/2203.15556

8、Scaling Instruction-Finetuned Language Models https://arxiv.org/pdf/2210.11416.pdf

Direct Preference Optimization:

9、Your Language Model is Secretly a Reward Model https://arxiv.org/pdf/2305.18290.pdf

10、Progress measures for grokking via mechanistic interpretability https://arxiv.org/abs/2301.05217

11、Language Models Represent Space and Time https://arxiv.org/abs/2310.02207

12、GLaM: Efficient Scaling of Language Models with Mixture-of-Experts https://arxiv.org/abs/2112.06905

13、Adam: A Method for Stochastic Optimization https://arxiv.org/abs/1412.6980

14、Efficient Estimation of Word Representations in Vector Space (Word2Vec) https://arxiv.org/abs/1301.3781

15、Distributed Representations of Words and Phrases and their Compositionality https://arxiv.org/abs/1310.4546

attention is all you need

基于深度神经网络(DNN)的序列到序列学习方法

Emergent Abilities of Large Language Models

相关推荐
彼岸花开了吗4 分钟前
构建AI智能体:六十五、模型智能训练控制:早停机制在深度学习中的应用解析
人工智能·python
week_泽4 分钟前
2、OpenCV Harris角点检测笔记
人工智能·笔记·opencv
小真zzz4 分钟前
ChatPPT × Nano Banana Pro:AI演示设计的王者革新
人工智能·ai·powerpoint·ppt·chatppt·nano banana pro·创意绘图
NAGNIP4 分钟前
Hugging Face 200页的大模型训练实录
人工智能·算法
import_random9 分钟前
[深度学习]RNN,LSTM,GRU(联系和区别)
深度学习
没有梦想的咸鱼185-1037-16639 分钟前
面向自然科学的人工智能建模方法【涵盖机器学习与深度学习的核心方法(如随机森林、XGBoost、CNN、LSTM、Transformer等)】
人工智能·深度学习·随机森林·机器学习·数据分析·卷积神经网络·transformer
NAGNIP10 分钟前
Transformer 中为什么用LayerNorm而不用BatchNorm?
人工智能·面试
阿里云云原生11 分钟前
AgentRun:如何利用 AI Agent 构建现代化的舆情分析解决方案?
人工智能·unity·游戏引擎
Luhui Dev14 分钟前
2025 开源大模型生态回顾一览
人工智能·开源
WitsMakeMen14 分钟前
scaled_dot_product_attention实现
人工智能·llm