LLama: Open and Effecient Foundation Language Models

This paper is inspired by the Chinchilla scaling law. It found that given a fixed computing budget, the best performance is not generated by the larger models, but by the smaller models trained on more data . So it proposed a collection of models ranging from 7B to 65B. These smaller models outperforms other bigger models.

1. Architecture

It based on traditional transformer models, and leveraged some improvement proposed in the subquently large language models. The main change point is:

  • Pre-normalization, which nomalized the input in the sub-layer, instead of the output.
  • SwiGelu, instead of Relu.
  • Rotary Embeddidngs.

2. Efficient Implementation

  • The casual multihead attention. Which need me to explore the behind logic further.
  • Reduce the amount of activations that are recomputed during the backward pass.
  • Save the activation by manually implementing it, instead of using PyTorch Autograd in backward pass.
  • Using model and sequence parallelism to reduce the memory usage.
  • Using the overlay the computing and comunication bewteen different GPUs as much as possible.
相关推荐
zzz_23684 分钟前
从 200 行规则到一条好渠——Agent 工程化的踩坑与解法
人工智能·agent
Bruce_Liuxiaowei8 分钟前
2026年7月第1周网络安全形势周报
人工智能·安全·web安全·ai·智能体
A.说学逗唱的Coke15 分钟前
【大模型专题】Claude Haiku vs Sonnet vs Opus:三款模型深度对比与选型指南(2026最新)
大数据·人工智能
梦想的旅途218 分钟前
基于RPA技术的企业微信自动化接口设计思路与应用实践
人工智能·机器人·自动化·企业微信·rpa
2601_9545267519 分钟前
【工控底层架构】进口阀门和国产阀门哪个性价比高?从TCO模型到边缘诊断源码的全栈解析
人工智能·架构·硬件工程
sunywz21 分钟前
【AI智能客服系统】02.项目部署与运行
人工智能
JackHCC24 分钟前
自进化智能体协同进化综述
人工智能·机器学习
项目管理者24 分钟前
PMP 专业项目管理软件核心应用场景指南
人工智能·甘特图·敏捷流程
Arranging1578825 分钟前
会议纪要整理场景下主流办公效率工具使用体验分析
人工智能
cd_9492172133 分钟前
AI Infra选型指南:企业算力底座怎么建
人工智能