Llama 2 Powered By ONNX

Llama 2 Powered By ONNX

  • [1. Llama 2](#1. Llama 2)
    • [1.1. The structure of Llama 2](#1.1. The structure of Llama 2)
  • References

https://github.com/microsoft/Llama-2-Onnx

1. Llama 2

Llama 2 is a collection of pretrained and fine-tuned generative text models.

1.1. The structure of Llama 2

Llama 2 model consists of a stack of decoder layers. Each decoder layer (or transformer block) is constructed from one self-attention layer and one feed-forward multi-layer perceptron.

Llama models use different projection sizes compared with classic transformers in the feed-forward layer, for instance, both Llama 1 and Llama 2 projection use 2.7x hidden size rather than the standard 4x hidden size.

A key difference between Llama 1 and Llama 2 is the architectural change of attention layer, in which Llama 2 takes advantage of Grouped Query Attention (GQA) mechanism to improve efficiency.


Llama2 Model


Llama2 Model Top View


Decoder Layer

References

1\] Yongqiang Cheng,

相关推荐
Techblog of HaoWANG3 天前
目标检测与跟踪(10)-- Jetson Xavier NX刷机、移植&部署YOLOv8量化模型(中)
python·yolo·目标检测·onnx·量化部署
weixin_468466851 个月前
PyTorch导出ONNX格式分割模型及在C#中调用预测
人工智能·pytorch·深度学习·c#·跨平台·onnx·语义分割
charlee441 个月前
从零实现一个生产级 RAG 语义搜索系统:C++ + ONNX + FAISS 实战
c++·faiss·onnx·rag·语义搜索
Together_CZ2 个月前
ultralytics.utils.export——engine.py、imx.py、tensorflow.py各模型导出子模块代码详读
tensorflow·onnx·ultralytics·utils.export·engine.py·imx.py·模型导出
love530love2 个月前
告别环境崩溃:ONNX 与 Protobuf 版本兼容性指南
人工智能·windows·python·onnx·stablediffusion·comfyui·protobuf
深色風信子2 个月前
SpringAi 加载 ONNX Embedding
embedding·onnx·springai
abcd_zjq2 个月前
VS2022+QT6.9配置ONNXruntime GPU、CUDA、cuDNN(附官网下载链接)(GPU开启代码示例)
qt·visual studio·cuda·onnx
Techblog of HaoWANG3 个月前
目标检测与跟踪 (7)- YOLOv8 ONNX量化模型部署指南
yolo·目标检测·onnx·量化部署
地狱为王3 个月前
Unity使用sherpa-onnx实现关键词检测
unity·onnx·sherpa-onnx·关键词检测
月满星沉3 个月前
ONNX量化
深度学习·onnx·量化