关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
芯盾时代30 分钟前
安全大模型智驱网络和数据安全效能跃迁
网络·人工智能·安全·网络安全
彩讯股份3006341 小时前
打造多模态交互新范式|彩讯股份中标2025年中国移动和留言平台AI智能体研发项目
人工智能
思通数科大数据舆情2 小时前
工业安全零事故的智能守护者:一体化AI智能安防平台
人工智能·安全·目标检测·计算机视觉·目标跟踪·数据挖掘·知识图谱
AI360labs_atyun2 小时前
2025 高考:AI 都在哪些地方发挥了作用
人工智能·科技·ai·高考
Yxh181377845543 小时前
短视频矩阵系统技术saas源头6年开发构架
人工智能·矩阵
m0_634448893 小时前
图上合成:用于大型语言模型持续预训练的知识合成数据生成
人工智能·语言模型·自然语言处理
张较瘦_4 小时前
[论文阅读] 人工智能 | 利用负信号蒸馏:用REDI框架提升LLM推理能力
论文阅读·人工智能
1296004525 小时前
机器学习的可解释性
人工智能·深度学习·自然语言处理·transformer
何中应5 小时前
第一个人工智能(AI)问答Demo
java·人工智能·语言模型
InternLM5 小时前
论文分类打榜赛Baseline(2):InternLM昇腾硬件微调实践
人工智能·分类·大模型·internlm·书生大模型