关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
老金带你玩AI3 分钟前
这个Skill能自动学会你的所有习惯,踩过的坑!
人工智能
power 雀儿5 分钟前
LibTorch激活函数&LayerNorm归一化
c++·人工智能
yuzhuanhei12 分钟前
基于Claude Code实现MobileNetV3训练记录
人工智能·深度学习
Loo国昌18 分钟前
【AI应用开发实战】05_GraphRAG:知识图谱增强检索实战
人工智能·后端·python·语言模型·自然语言处理·金融·知识图谱
Dr.AE18 分钟前
金蝶AI星辰 产品分析报告
大数据·人工智能
LaughingZhu27 分钟前
Product Hunt 每日热榜 | 2026-02-22
人工智能·经验分享·深度学习·神经网络·产品运营
数据智能老司机29 分钟前
打造 ML/AI 系统的内部开发者平台(IDP)——设计可靠的机器学习(ML)系统
人工智能·llm·aiops
上进小菜猪34 分钟前
基于 YOLOv8 的面向矿井场景的煤炭图像智能检测系统 [目标检测完整源码](YOLOv8 + PyQt5 实战)
人工智能
~央千澈~40 分钟前
08实战处理AI音乐技术详解第三阶段:时间人性化(Timing Humanization)·卓伊凡
人工智能
xwz小王子1 小时前
Nature Electronics:基于单尖峰编码的人机界面端到端忆阻硬件系统
人工智能·忆阻