大语言模型-大模型基础文献

大模型基础

1、Attention Is All You Needhttps://arxiv.org/abs/1706.03762

attention is all you need

2、Sequence to Sequence Learning with Neural Networks https://arxiv.org/abs/1409.3215

基于深度神经网络(DNN)的序列到序列学习方法

3、Neural Machine Translation by Jointly Learning to Align and Translate https://arxiv.org/abs/1409.0473

4、BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805

5、Scaling Laws for Neural Language Models https://arxiv.org/pdf/2001.08361.pdf

6、Emergent Abilities of Large Language Models https://openreview.net/pdf?id=yzkSU5zdwD

Emergent Abilities of Large Language Models

7、Training Compute-Optimal Large Language Models (ChinChilla scaling law) https://arxiv.org/abs/2203.15556

8、Scaling Instruction-Finetuned Language Models https://arxiv.org/pdf/2210.11416.pdf

Direct Preference Optimization:

9、Your Language Model is Secretly a Reward Model https://arxiv.org/pdf/2305.18290.pdf

10、Progress measures for grokking via mechanistic interpretability https://arxiv.org/abs/2301.05217

11、Language Models Represent Space and Time https://arxiv.org/abs/2310.02207

12、GLaM: Efficient Scaling of Language Models with Mixture-of-Experts https://arxiv.org/abs/2112.06905

13、Adam: A Method for Stochastic Optimization https://arxiv.org/abs/1412.6980

14、Efficient Estimation of Word Representations in Vector Space (Word2Vec) https://arxiv.org/abs/1301.3781

15、Distributed Representations of Words and Phrases and their Compositionality https://arxiv.org/abs/1310.4546

attention is all you need

基于深度神经网络(DNN)的序列到序列学习方法

Emergent Abilities of Large Language Models

相关推荐
MWHLS36 分钟前
[AAAI Oral] 简单通用的公平分类方法
人工智能·论文·图像分类·语义分割·reid
AI technophile39 分钟前
OpenCV计算机视觉实战(11)——边缘检测详解
人工智能·opencv·计算机视觉
百万蹄蹄向前冲1 小时前
大学期末考,AI定制个性化考试体验
前端·人工智能·面试
SuperW1 小时前
RV1126+OPENCV在视频中添加时间戳
人工智能·opencv·音视频
AI扶我青云志2 小时前
激活函数-sigmoid、tanh、relu、softmax对比
人工智能·深度学习·神经网络
云云3212 小时前
封号零风险」策略:用亚矩阵云手机解锁Telegram的100%隐匿工作流
人工智能·智能手机·矩阵
蓦然回首却已人去楼空2 小时前
用mac的ollama访问模型,为什么会出现模型胡乱输出,然后过一会儿再访问,就又变成正常的
人工智能·macos
点云SLAM3 小时前
Pytorch中gather()函数详解和实战示例
人工智能·pytorch·python·深度学习·机器学习·计算视觉·gather函数
深度学习_乌冬面4 小时前
RNN为什么不适合大语言模型
人工智能·rnn·语言模型
ZWaruler4 小时前
二十八: 深度学习 (完结)
人工智能·深度学习·深度学习的高速化·深度学习的历史