关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
组合缺一3 分钟前
Claude Code Agent Skills vs. Solon AI Skills:从工具增强到框架规范的深度对齐
java·人工智能·python·开源·solon·skills
小龙报9 分钟前
【SOLIDWORKS 练习题】草图专题:1.带座轴承
人工智能·嵌入式硬件·物联网·硬件架构·3d建模·硬件工程·精益工程
人工智能AI技术10 分钟前
【C#程序员入门AI】AI应用的操作系统:Semantic Kernel 2026实战
人工智能·c#
海天一色y10 分钟前
基于Inception-V3实现CIFAR-100数据集的分类任务
人工智能·分类·数据挖掘
啊豪的思想11 分钟前
算力为擎,算法为枢,数据为薪:人工智能三大核心要素的协同演进逻辑
网络·人工智能
@我不是大鹏18 分钟前
44、AI大模型技术之智图寻宝项目实战(2公共模块及去噪模块)
人工智能
春日见18 分钟前
三分钟安装window Docker,并与Ubuntu(WSL)建立连接
linux·人工智能·windows·驱动开发·机器学习·docker·容器
Loo国昌20 分钟前
【LangChain1.0】第十四阶段:Agent最佳设计模式与生产实践
人工智能·后端·算法·语言模型·架构
阳艳讲ai26 分钟前
九尾狐AI智能矩阵:重构企业获客新引擎
大数据·人工智能
Liue6123123126 分钟前
窗帘检测与识别_YOLOv26模型详解与应用_1
人工智能·yolo·目标跟踪