Llama 2 Powered By ONNX

Llama 2 Powered By ONNX

  • [1. Llama 2](#1. Llama 2)
    • [1.1. The structure of Llama 2](#1.1. The structure of Llama 2)
  • References

https://github.com/microsoft/Llama-2-Onnx

1. Llama 2

Llama 2 is a collection of pretrained and fine-tuned generative text models.

1.1. The structure of Llama 2

Llama 2 model consists of a stack of decoder layers. Each decoder layer (or transformer block) is constructed from one self-attention layer and one feed-forward multi-layer perceptron.

Llama models use different projection sizes compared with classic transformers in the feed-forward layer, for instance, both Llama 1 and Llama 2 projection use 2.7x hidden size rather than the standard 4x hidden size.

A key difference between Llama 1 and Llama 2 is the architectural change of attention layer, in which Llama 2 takes advantage of Grouped Query Attention (GQA) mechanism to improve efficiency.


Llama2 Model


Llama2 Model Top View


Decoder Layer

References

[1] Yongqiang Cheng, https://yongqiang.blog.csdn.net/

相关推荐
Hi202402171 个月前
RK3588-NPU pytorch-image-models 模型编译测试
人工智能·pytorch·python·rk3588·onnx·推理
后端常规开发人员1 个月前
最好用的图文识别OCR -- PaddleOCR(2) 提高推理效率(PPOCR模型转ONNX模型进行推理)
python·ocr·onnx·paddleocr
JadePeng3 个月前
Windows 使用 Intel(R) Arc(TM) GPU 推理ONNX 模型
onnx
坐望云起4 个月前
Ubuntu20.04 更新Nvidia驱动 + 安装CUDA12.1 + cudnn8.9.7
linux·ubuntu·nvidia·cuda·onnx·1024程序员节
Arnold-FY-Chen4 个月前
解决低版本pytorch和onnx组合时torch.atan2()不被onnx支持的问题
pytorch·onnxruntime·onnx·atan·atan2
牙牙要健康5 个月前
【深度学习】【图像分类】【OnnxRuntime】【Python】VggNet模型部署
onnx
城城城_6 个月前
yolact导出onnx
python·yolo·分割·onnx·yolact
Ephemeroptera8 个月前
搭建自己的AI模型应用网站:JavaScript + Flask-Python + ONNX
javascript·人工智能·python·onnx
Ephemeroptera8 个月前
导出 Whisper 模型到 ONNX
whisper·openai·语音识别·onnx·int8