关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
鼎道开发者联盟8 分钟前
2025中国AI开源生态报告发布,鼎道智联助力产业高质量发展
人工智能·开源·gui
贾维思基8 分钟前
告别RPA和脚本!视觉推理Agent,下一代自动化的暴力解法
人工智能·agent
P-ShineBeam15 分钟前
引导式问答-对话式商品搜索-TRACER
人工智能·语言模型·自然语言处理·知识图谱
j_jiajia16 分钟前
(一)人工智能算法之监督学习——KNN
人工智能·学习·算法
Hcoco_me24 分钟前
大模型面试题62:PD分离
人工智能·深度学习·机器学习·chatgpt·机器人
OpenCSG1 小时前
AgenticOps 如何重构企业 AI 的全生命周期管理体系
大数据·人工智能·深度学习
阿里云大数据AI技术1 小时前
漫画说:为什么你的“增量计算”越跑越慢?——90%的实时数仓团队都踩过的坑,藏在这几格漫画里
大数据·人工智能
Gavin在路上1 小时前
SpringAIAlibaba之上下文工程与GraphRunnerContext 深度解析(8)
人工智能
All The Way North-1 小时前
RNN基本介绍
rnn·深度学习·nlp·循环神经网络·时间序列
yatingliu20191 小时前
将深度学习环境迁移至老旧系统| 个人学习笔记
笔记·深度学习·学习