大语言模型-大模型基础文献

大模型基础

1、Attention Is All You Needhttps://arxiv.org/abs/1706.03762

attention is all you need

2、Sequence to Sequence Learning with Neural Networks https://arxiv.org/abs/1409.3215

基于深度神经网络(DNN)的序列到序列学习方法

3、Neural Machine Translation by Jointly Learning to Align and Translate https://arxiv.org/abs/1409.0473

4、BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805

5、Scaling Laws for Neural Language Models https://arxiv.org/pdf/2001.08361.pdf

6、Emergent Abilities of Large Language Models https://openreview.net/pdf?id=yzkSU5zdwD

Emergent Abilities of Large Language Models

7、Training Compute-Optimal Large Language Models (ChinChilla scaling law) https://arxiv.org/abs/2203.15556

8、Scaling Instruction-Finetuned Language Models https://arxiv.org/pdf/2210.11416.pdf

Direct Preference Optimization:

9、Your Language Model is Secretly a Reward Model https://arxiv.org/pdf/2305.18290.pdf

10、Progress measures for grokking via mechanistic interpretability https://arxiv.org/abs/2301.05217

11、Language Models Represent Space and Time https://arxiv.org/abs/2310.02207

12、GLaM: Efficient Scaling of Language Models with Mixture-of-Experts https://arxiv.org/abs/2112.06905

13、Adam: A Method for Stochastic Optimization https://arxiv.org/abs/1412.6980

14、Efficient Estimation of Word Representations in Vector Space (Word2Vec) https://arxiv.org/abs/1301.3781

15、Distributed Representations of Words and Phrases and their Compositionality https://arxiv.org/abs/1310.4546

attention is all you need

基于深度神经网络(DNN)的序列到序列学习方法

Emergent Abilities of Large Language Models

相关推荐
飞哥数智坊5 分钟前
打造我的 AI 开发团队(四):在 Cursor 里跑通 bmad
人工智能·ai编程
aneasystone本尊21 分钟前
重温 Java 21 学习笔记
人工智能
zandy101137 分钟前
架构解析:衡石科技如何基于AI+Data Agent重构智能数据分析平台
人工智能·科技·架构
golang学习记1 小时前
免费解锁 Cursor AI 全功能:4 种开发者私藏方案揭秘
人工智能
图扑软件1 小时前
热力图可视化为何被广泛应用?| 图扑数字孪生
大数据·人工智能·信息可视化·数字孪生·可视化·热力图·电力能源
qq_436962181 小时前
AI驱动数据分析革新:奥威BI一键生成智能报告
人工智能·信息可视化·数据分析
卓码软件测评1 小时前
借助大语言模型实现高效测试迁移:Airbnb的大规模实践
开发语言·前端·javascript·人工智能·语言模型·自然语言处理
小白狮ww1 小时前
dots.ocr 基于 1.7B 参数实现多语言文档处理,性能达 SOTA
人工智能·深度学习·机器学习·自然语言处理·ocr·小红书·文档处理
hrrrrb1 小时前
【机器学习】无监督学习
人工智能·学习·机器学习
IT_陈寒1 小时前
SpringBoot 3.0实战:这套配置让我轻松扛住百万并发,性能提升300%
前端·人工智能·后端