关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
β添砖java2 小时前
机器学习----深度学习部分
人工智能·深度学习·机器学习
GMICLOUD3 小时前
GMI Cloud@AI 周报 | DeepSeek V3.2 系列震撼开源;Claude Opus 4.5 发布
人工智能·ai·ai资讯
QT 小鲜肉3 小时前
【孙子兵法之中篇】009. 孙子兵法·行军篇
人工智能·笔记·读书·孙子兵法
FL16238631293 小时前
智慧工地建筑工地常见装备手推车切割机安全帽检测数据集VOC+YOLO格式13364张15类别
深度学习·yolo·机器学习
c#上位机3 小时前
halcon计算区域骨架
图像处理·人工智能·计算机视觉·c#·halcon
天一生水water3 小时前
储层认知→技术落地→产量优化
人工智能·算法·机器学习
华清远见成都中心3 小时前
人工智能的关键技术有哪些?
人工智能
绿蕉3 小时前
智能底盘:汽车革命的“新基石”
大数据·人工智能
GAOJ_K3 小时前
滚珠花键的使用时长与性能保持的量化关系
大数据·人工智能·科技·自动化·制造
天一生水water3 小时前
页岩油生产流程案例
人工智能·智慧油田