关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
白日做梦Q4 分钟前
Transformer 能否取代 CNN?图像去噪中的新范式探索
深度学习·cnn·transformer
南极星100510 分钟前
OPENCV(python)--初学之路(十四)哈里斯角检测
人工智能·opencv·计算机视觉
咚咚王者16 分钟前
人工智能之数据分析 Pandas:第九章 性能优化
人工智能·数据分析·pandas
Acrel1500035313819 分钟前
重构能源管理:Acrel EMS 3.0 让降本增效成为底层逻辑
大数据·人工智能
dhdjjsjs32 分钟前
Day31 PythonStudy
人工智能·机器学习
TextIn智能文档云平台36 分钟前
深度学习在版面分析中的应用方法
人工智能·深度学习
金融小师妹36 分钟前
黄金上探4260后基于阻力位识别模型回落,本周聚焦美联储决议的LSTM-NLP联合预测
大数据·人工智能·深度学习
Coding茶水间41 分钟前
基于深度学习的船舶检测系统演示与介绍(YOLOv12/v11/v8/v5模型+Pyqt5界面+训练代码+数据集)
图像处理·人工智能·深度学习·yolo·目标检测·计算机视觉
我不是小upper1 小时前
CNN+BiLSTM !!最强序列建模组合!!!
人工智能·python·深度学习·神经网络·cnn
锐学AI1 小时前
从零开始学MCP(四)- 认识MCP clients
人工智能·python