关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
2501_920953862 分钟前
目前有实力的6S咨询公司推荐
大数据·人工智能
Soonyang Zhang9 分钟前
xllm源码分析(四)——pd分离处理流程
人工智能·推理框架
传说故事11 分钟前
【论文自动阅读】How Much 3D Do Video Foundation Models Encode?
人工智能·深度学习·3d
Sherry Wangs19 分钟前
【Science Robotics】Human-robot facial coexpression
人工智能·具身智能·emo机器人
Turboex邮件分享22 分钟前
邮件系统的未来趋势:AI、机器学习与大数据分析的融合应用
人工智能·机器学习·数据分析
RockHopper202524 分钟前
寻找具身智能系统中的传统工程理论脉络
人工智能·具身智能·具身认知
爱打代码的小林26 分钟前
机器学习(数据清理)
人工智能·机器学习
囊中之锥.28 分钟前
神经网络原理通俗讲解:结构、偏置、损失函数与梯度下降
人工智能·深度学习·神经网络
weixin_3776348435 分钟前
【2026目标检测】高质量模型汇总
人工智能·目标检测·目标跟踪
光羽隹衡35 分钟前
机器学习——PCA数据降维
人工智能·机器学习