Llama 2 Powered By ONNX

Llama 2 Powered By ONNX

  • [1. Llama 2](#1. Llama 2)
    • [1.1. The structure of Llama 2](#1.1. The structure of Llama 2)
  • References

https://github.com/microsoft/Llama-2-Onnx

1. Llama 2

Llama 2 is a collection of pretrained and fine-tuned generative text models.

1.1. The structure of Llama 2

Llama 2 model consists of a stack of decoder layers. Each decoder layer (or transformer block) is constructed from one self-attention layer and one feed-forward multi-layer perceptron.

Llama models use different projection sizes compared with classic transformers in the feed-forward layer, for instance, both Llama 1 and Llama 2 projection use 2.7x hidden size rather than the standard 4x hidden size.

A key difference between Llama 1 and Llama 2 is the architectural change of attention layer, in which Llama 2 takes advantage of Grouped Query Attention (GQA) mechanism to improve efficiency.


Llama2 Model


Llama2 Model Top View


Decoder Layer

References

1 Yongqiang Cheng, https://yongqiang.blog.csdn.net/

相关推荐
慢慢向上的蜗牛1 天前
Qwen3-0.6B ONNX(KV-Cache)模型部署
llm·onnx·文本生成·自回归·kv-cache
指尖在键盘上舞动2 天前
RKNN 模型部署:onnx转rknn后精度下降 —— 精度调优与问题排查
python·ubuntu·rk3588·rknn·onnx·npu
vonlycn2 个月前
PaddleDetection转ONNX 填坑
python·onnx·paddledetection
antzou2 个月前
字幕视频合成
onnx·tts·asr·vad·paraformer
antzou2 个月前
语音识别 (ASR)
人工智能·语音识别·onnx·asr·paraformer
小垣2 个月前
java调用yolo26n.onnx模型输出图像推理检测
java·人工智能·深度学习·onnx
七夜zippoe2 个月前
模型部署优化:ONNX与TensorRT实战——从训练到推理的完整优化链路
人工智能·python·tensorflow·tensorrt·onnx
Techblog of HaoWANG3 个月前
目标检测与跟踪(10)-- Jetson Xavier NX刷机、移植&部署YOLOv8量化模型(中)
python·yolo·目标检测·onnx·量化部署
weixin_468466854 个月前
PyTorch导出ONNX格式分割模型及在C#中调用预测
人工智能·pytorch·深度学习·c#·跨平台·onnx·语义分割
charlee444 个月前
从零实现一个生产级 RAG 语义搜索系统:C++ + ONNX + FAISS 实战
c++·faiss·onnx·rag·语义搜索