关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
码农很忙17 分钟前
解锁数据库迁移新姿势:让AI真正“可用、可信、可落地”
大数据·人工智能
人工智能培训22 分钟前
10分钟了解向量数据库(1)
人工智能·深度学习·算法·机器学习·大模型·智能体搭建
北数云1 小时前
北数云v4.6.4 版本上线及域名切换通知
人工智能·开源·gpu算力·模型
小程故事多_801 小时前
从零吃透PyTorch,最易懂的入门全指南
人工智能·pytorch·python
AI科技星1 小时前
统一场论中电场的几何起源:基于立体角变化率的第一性原理推导与验证
服务器·人工智能·线性代数·算法·矩阵·生活
晓晓不觉早1 小时前
2026 AI 垂直领域展望:从通用到专精,场景深耕成破局关键
人工智能
lifetime‵(+﹏+)′1 小时前
5060显卡Windows配置Anaconda中的CUDA及Pytorch
人工智能·pytorch·windows
老鱼说AI1 小时前
万字长文警告!一次性搞定GAN(生成对抗网络):从浅入深原理级精析 + PyTorch代码逐行讲解实现
人工智能·深度学习·神经网络·生成对抗网络·计算机视觉·ai作画·超分辨率重建
START_GAME1 小时前
深度学习环境配置:PyTorch、CUDA和Python版本选择
人工智能·pytorch·深度学习
盼小辉丶1 小时前
PyTorch生成式人工智能(30)——扩散模型(Diffusion Model)
pytorch·深度学习·生成模型·扩散模型