Llama 2 Powered By ONNX

Llama 2 Powered By ONNX

  • [1. Llama 2](#1. Llama 2)
    • [1.1. The structure of Llama 2](#1.1. The structure of Llama 2)
  • References

https://github.com/microsoft/Llama-2-Onnx

1. Llama 2

Llama 2 is a collection of pretrained and fine-tuned generative text models.

1.1. The structure of Llama 2

Llama 2 model consists of a stack of decoder layers. Each decoder layer (or transformer block) is constructed from one self-attention layer and one feed-forward multi-layer perceptron.

Llama models use different projection sizes compared with classic transformers in the feed-forward layer, for instance, both Llama 1 and Llama 2 projection use 2.7x hidden size rather than the standard 4x hidden size.

A key difference between Llama 1 and Llama 2 is the architectural change of attention layer, in which Llama 2 takes advantage of Grouped Query Attention (GQA) mechanism to improve efficiency.


Llama2 Model


Llama2 Model Top View


Decoder Layer

References

1\] Yongqiang Cheng,

相关推荐
AIminminHu5 天前
底层视觉及图像增强-项目实践理论补充(十六-0-(26):Onnx---》底层视觉及图像增强):从奥运大屏到手机小屏,快来挖一挖里面都有什么
onnx
从孑开始1 个月前
ManySpeech.MoonshineAsr 使用指南
人工智能·ai·c#·.net·私有化部署·语音识别·onnx·asr·moonshine
断水客2 个月前
如何自动生成ONNX模型?
人工智能·ai·onnx·ai编译器
楚潸潸4 个月前
从onnx模型到om模型的全自动化转化
深度学习·边缘计算·onnx·昇腾·om模型
谢白羽4 个月前
tensorRT配合triton部署模型
yolo·tensorrt·onnx·triton
一包烟电脑面前做一天4 个月前
.Net + Qdrant 使用Microsoft.ML.OnnxRuntime调用本地大模型实现文本向量化,实现简单RAG
.net·onnx·rag·文本向量化·本地大模型·qdrant
AlfredZhao6 个月前
曾经风光无限的 Oracle DBA 已经落伍了吗?
ai·vector·embedding·onnx·hnsw·ivf
搬砖的阿wei8 个月前
ONNX:统一深度学习工作流的关键枢纽
人工智能·python·深度学习·onnx
Hi2024021710 个月前
RK3588-NPU pytorch-image-models 模型编译测试
人工智能·pytorch·python·rk3588·onnx·推理