LLama: Open and Effecient Foundation Language Models

This paper is inspired by the Chinchilla scaling law. It found that given a fixed computing budget, the best performance is not generated by the larger models, but by the smaller models trained on more data . So it proposed a collection of models ranging from 7B to 65B. These smaller models outperforms other bigger models.

1. Architecture

It based on traditional transformer models, and leveraged some improvement proposed in the subquently large language models. The main change point is:

  • Pre-normalization, which nomalized the input in the sub-layer, instead of the output.
  • SwiGelu, instead of Relu.
  • Rotary Embeddidngs.

2. Efficient Implementation

  • The casual multihead attention. Which need me to explore the behind logic further.
  • Reduce the amount of activations that are recomputed during the backward pass.
  • Save the activation by manually implementing it, instead of using PyTorch Autograd in backward pass.
  • Using model and sequence parallelism to reduce the memory usage.
  • Using the overlay the computing and comunication bewteen different GPUs as much as possible.
相关推荐
海兰4 分钟前
从原始日志到系统知识:补齐 AI 可观测性的“上下文层“
人工智能·elasticsearch
爱喝水的木子4 分钟前
LearnPilot AI
人工智能
完成大叔8 分钟前
从脚本到Agent:工具模式下的智能价值
人工智能·langchain
weixin_436182428 分钟前
工业 AI 芯片如何选型?告别纸质手册,实现快速比对
人工智能·ai芯片·ai助手
searchforAI12 分钟前
视频画面里的PPT怎么提取?视频转图文讲义的实操教程
人工智能·学习·ai·aigc·powerpoint·音视频·贴图
Rain50913 分钟前
mini-cc:一个轻量级 AI 编程助手的诞生
人工智能·typescript·ai编程
hyunbar14 分钟前
Fish Audio(鱼声)+ Python:零门槛用自己声音合成任何文本
人工智能
厚国兄17 分钟前
Agent_Skills_万千应用_第03篇_PPT 生成 Skill:从资料到可演示幻灯片
人工智能·powerpoint·agent
o561路6o623o719 分钟前
陈,跳台记录仪 大鼠跳台记录仪 小鼠跳台记录仪
人工智能