大语言模型-大模型基础文献

大模型基础

1、Attention Is All You Needhttps://arxiv.org/abs/1706.03762

attention is all you need

2、Sequence to Sequence Learning with Neural Networks https://arxiv.org/abs/1409.3215

基于深度神经网络(DNN)的序列到序列学习方法

3、Neural Machine Translation by Jointly Learning to Align and Translate https://arxiv.org/abs/1409.0473

4、BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805

5、Scaling Laws for Neural Language Models https://arxiv.org/pdf/2001.08361.pdf

6、Emergent Abilities of Large Language Models https://openreview.net/pdf?id=yzkSU5zdwD

Emergent Abilities of Large Language Models

7、Training Compute-Optimal Large Language Models (ChinChilla scaling law) https://arxiv.org/abs/2203.15556

8、Scaling Instruction-Finetuned Language Models https://arxiv.org/pdf/2210.11416.pdf

Direct Preference Optimization:

9、Your Language Model is Secretly a Reward Model https://arxiv.org/pdf/2305.18290.pdf

10、Progress measures for grokking via mechanistic interpretability https://arxiv.org/abs/2301.05217

11、Language Models Represent Space and Time https://arxiv.org/abs/2310.02207

12、GLaM: Efficient Scaling of Language Models with Mixture-of-Experts https://arxiv.org/abs/2112.06905

13、Adam: A Method for Stochastic Optimization https://arxiv.org/abs/1412.6980

14、Efficient Estimation of Word Representations in Vector Space (Word2Vec) https://arxiv.org/abs/1301.3781

15、Distributed Representations of Words and Phrases and their Compositionality https://arxiv.org/abs/1310.4546

attention is all you need

基于深度神经网络(DNN)的序列到序列学习方法

Emergent Abilities of Large Language Models

相关推荐
Shawn_Shawn2 小时前
mcp学习笔记(一)-mcp核心概念梳理
人工智能·llm·mcp
技术路上的探险家3 小时前
8 卡 V100 服务器:基于 vLLM 的 Qwen 大模型高效部署实战
运维·服务器·语言模型
33三 三like4 小时前
《基于知识图谱和智能推荐的养老志愿服务系统》开发日志
人工智能·知识图谱
芝士爱知识a4 小时前
【工具推荐】2026公考App横向评测:粉笔、华图与智蛙面试App功能对比
人工智能·软件推荐·ai教育·结构化面试·公考app·智蛙面试app·公考上岸
腾讯云开发者5 小时前
港科大熊辉|AI时代的职场新坐标——为什么你应该去“数据稀疏“的地方?
人工智能
工程师老罗5 小时前
YoloV1数据集格式转换,VOC XML→YOLOv1张量
xml·人工智能·yolo
yLDeveloper5 小时前
从模型评估、梯度难题到科学初始化:一步步解析深度学习的训练问题
深度学习
Coder_Boy_5 小时前
技术让开发更轻松的底层矛盾
java·大数据·数据库·人工智能·深度学习
啊森要自信6 小时前
CANN ops-cv:面向计算机视觉的 AI 硬件端高效算子库核心架构与开发逻辑
人工智能·计算机视觉·架构·cann