关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
翱翔的苍鹰1 小时前
多Agent智能体架构设计思路
人工智能·pytorch·python
Liue612312311 小时前
【AI计算机视觉】YOLOv26硬币检测与识别系统,高效准确识别各类硬币,代码与模型全开源,不容错过_2
人工智能·yolo·计算机视觉
Faker66363aaa2 小时前
航空基地设施目标检测 - YOLOv26实现战斗机机库非作战飞机旋翼飞机自动识别定位
人工智能·yolo·目标检测
Lun3866buzha2 小时前
Bundaberg Rum 700mL酒瓶检测实战:基于YOLOv26的高精度识别方案
人工智能·yolo·目标跟踪
Σίσυφος19002 小时前
OpenCV - SVM算法
人工智能·opencv·算法
落雨盛夏4 小时前
深度学习|李哥考研4图片分类比较详细说明
人工智能·深度学习·分类
臭东西的学习笔记8 小时前
论文学习——机器学习引导的蛋白质工程
人工智能·学习·机器学习
大王小生8 小时前
说说CSV文件和C#解析csv文件的几种方式
人工智能·c#·csv·csvhelper·csvreader
m0_462605228 小时前
第G3周:CGAN入门|生成手势图像
人工智能
bubiyoushang8889 小时前
基于LSTM神经网络的短期风速预测实现方案
人工智能·神经网络·lstm