LLama: Open and Effecient Foundation Language Models

This paper is inspired by the Chinchilla scaling law. It found that given a fixed computing budget, the best performance is not generated by the larger models, but by the smaller models trained on more data . So it proposed a collection of models ranging from 7B to 65B. These smaller models outperforms other bigger models.

1. Architecture

It based on traditional transformer models, and leveraged some improvement proposed in the subquently large language models. The main change point is:

  • Pre-normalization, which nomalized the input in the sub-layer, instead of the output.
  • SwiGelu, instead of Relu.
  • Rotary Embeddidngs.

2. Efficient Implementation

  • The casual multihead attention. Which need me to explore the behind logic further.
  • Reduce the amount of activations that are recomputed during the backward pass.
  • Save the activation by manually implementing it, instead of using PyTorch Autograd in backward pass.
  • Using model and sequence parallelism to reduce the memory usage.
  • Using the overlay the computing and comunication bewteen different GPUs as much as possible.
相关推荐
Wilber的技术分享3 小时前
【大模型实战笔记 7】RAG技术:从原理到实战——基于Streamlit的智能文档问答系统
人工智能·笔记·langchain·llm·问答系统·rag·知识库检索
余俊晖3 小时前
多模态文档解析模型新进展:腾讯开源HunyuanOCR-0.9B模型架构、训练配方
人工智能·ocr·多模态
谷粒.3 小时前
DevOps流水线中的质量门禁设计:从理论到实践的全景解析
运维·开发语言·网络·人工智能·python·devops
GOTXX3 小时前
性能与可靠双突破:openEuler 服务器场景评测报告
运维·服务器·网络·人工智能·后端·python
神算大模型APi--天枢6463 小时前
智能协同与垂直深耕:聚合模型API算力平台重构软件开发生态
大数据·人工智能·科技·重构·架构·gpu算力
秋邱3 小时前
AR 技术团队搭建与规模化接单:从个人到团队的营收跃迁
前端·人工智能·后端·python·html·restful
后端小肥肠3 小时前
通吃网文投稿+AI漫剧版权!我用 n8n+飞书搭了个“万字爆款小说流水线”
人工智能·aigc·agent
Jerryhut3 小时前
sklearn函数总结五——特征降维 压缩数据 - 特征选择
人工智能·python·机器学习·sklearn
deephub3 小时前
自愈型RAG系统:从脆弱管道到闭环智能体的工程实践
人工智能·python·大语言模型·rag