LLama: Open and Effecient Foundation Language Models

This paper is inspired by the Chinchilla scaling law. It found that given a fixed computing budget, the best performance is not generated by the larger models, but by the smaller models trained on more data . So it proposed a collection of models ranging from 7B to 65B. These smaller models outperforms other bigger models.

1. Architecture

It based on traditional transformer models, and leveraged some improvement proposed in the subquently large language models. The main change point is:

  • Pre-normalization, which nomalized the input in the sub-layer, instead of the output.
  • SwiGelu, instead of Relu.
  • Rotary Embeddidngs.

2. Efficient Implementation

  • The casual multihead attention. Which need me to explore the behind logic further.
  • Reduce the amount of activations that are recomputed during the backward pass.
  • Save the activation by manually implementing it, instead of using PyTorch Autograd in backward pass.
  • Using model and sequence parallelism to reduce the memory usage.
  • Using the overlay the computing and comunication bewteen different GPUs as much as possible.
相关推荐
aigcapi1 天前
RAG 系统的黑盒测试:从算法对齐视角解析 GEO 优化的技术指标体系
大数据·人工智能·算法
上进小菜猪1 天前
基于深度学习的河道垃圾检测系统设计(YOLOv8)
人工智能
上天夭1 天前
模型训练篇
人工智能·深度学习·机器学习
小徐Chao努力1 天前
【Langchain4j-Java AI开发】09-Agent智能体工作流
java·开发语言·人工智能
Blossom.1181 天前
AI编译器实战:从零手写算子融合与自动调度系统
人工智能·python·深度学习·机器学习·flask·transformer·tornado
Coder_Boy_1 天前
SpringAI与LangChain4j的智能应用-(理论篇2)
人工智能·spring boot·langchain·springai
却道天凉_好个秋1 天前
OpenCV(四十八):图像查找
人工智能·opencv·计算机视觉
Coder_Boy_1 天前
SpringAI与LangChain4j的智能应用-(理论篇3)
java·人工智能·spring boot·langchain
GetcharZp1 天前
工地“火眼金睛”!手把手带你用 YOLO11 实现安全帽佩戴检测
人工智能·计算机视觉
Codebee1 天前
Ooder A2UI架构白皮书
人工智能·响应式编程