LLama: Open and Effecient Foundation Language Models

This paper is inspired by the Chinchilla scaling law. It found that given a fixed computing budget, the best performance is not generated by the larger models, but by the smaller models trained on more data . So it proposed a collection of models ranging from 7B to 65B. These smaller models outperforms other bigger models.

1. Architecture

It based on traditional transformer models, and leveraged some improvement proposed in the subquently large language models. The main change point is:

  • Pre-normalization, which nomalized the input in the sub-layer, instead of the output.
  • SwiGelu, instead of Relu.
  • Rotary Embeddidngs.

2. Efficient Implementation

  • The casual multihead attention. Which need me to explore the behind logic further.
  • Reduce the amount of activations that are recomputed during the backward pass.
  • Save the activation by manually implementing it, instead of using PyTorch Autograd in backward pass.
  • Using model and sequence parallelism to reduce the memory usage.
  • Using the overlay the computing and comunication bewteen different GPUs as much as possible.
相关推荐
腾讯云开发者3 小时前
港科大熊辉|AI时代的职场新坐标——为什么你应该去“数据稀疏“的地方?
人工智能
工程师老罗3 小时前
YoloV1数据集格式转换,VOC XML→YOLOv1张量
xml·人工智能·yolo
Coder_Boy_4 小时前
技术让开发更轻松的底层矛盾
java·大数据·数据库·人工智能·深度学习
啊森要自信4 小时前
CANN ops-cv:面向计算机视觉的 AI 硬件端高效算子库核心架构与开发逻辑
人工智能·计算机视觉·架构·cann
2401_836235864 小时前
中安未来SDK15:以AI之眼,解锁企业档案的数字化基因
人工智能·科技·深度学习·ocr·生活
njsgcs4 小时前
llm使用 AgentScope-Tuner 通过 RL 训练 FrozenLake 智能体
人工智能·深度学习
董董灿是个攻城狮4 小时前
AI 视觉连载2:灰度图
人工智能
yunfuuwqi5 小时前
OpenClaw✅真·喂饭级教程:2026年OpenClaw(原Moltbot)一键部署+接入飞书最佳实践
运维·服务器·网络·人工智能·飞书·京东云
九河云5 小时前
5秒开服,你的应用部署还卡在“加载中”吗?
大数据·人工智能·安全·机器学习·华为云