关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
东离与糖宝6 分钟前
Java 21 虚拟线程与 AI 推理结合的最新实践
java·人工智能
火山引擎开发者社区8 分钟前
火山养“龙虾”日志 | 14 大神仙玩法,原来 AI Agent 还能这么用
人工智能
新缸中之脑19 分钟前
Hermes-Agent 简明指南
人工智能
鲸鱼在dn19 分钟前
【CS336】Lecture1课程讲义-语言模型发展历程&Tokenization概念
人工智能·语言模型·自然语言处理
WiSirius21 分钟前
LLM:基于 AgentScope + Streamlit 的 AI Agent脑暴室
人工智能·深度学习·自然语言处理·大模型·llama
跨境猫小妹23 分钟前
采购交期拉长如何把补货策略从经验改为预测
大数据·人工智能·产品运营·跨境电商·营销策略
console.log('npc')27 分钟前
Cursor,Trae,Claude Code如何协作生产出一套前后台app?
前端·人工智能·react.js·设计模式·ai·langchain·ai编程
AI视觉网奇30 分钟前
动作迁移算法笔记 2026
人工智能·笔记
@TsUnAmI~37 分钟前
当翻译不只是翻译:我做了一个AI桌面翻译助手
人工智能