关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
昨夜见军贴061611 分钟前
AI审核守护生命设备安全:IACheck成为呼吸机消毒效果检测报告的智能审核专家
大数据·人工智能·安全
春日见13 分钟前
云服务器开发与SSH
运维·服务器·人工智能·windows·git·自动驾驶·ssh
半吊子全栈工匠38 分钟前
面向AI产品经理的统计学基础
人工智能·产品经理
Kel39 分钟前
深入 OpenAI Node SDK:一个请求的奇幻漂流
javascript·人工智能·架构
liliwoliliwo42 分钟前
yolo3 点
人工智能·深度学习
lifallen1 小时前
从零推导 Deep Agent 模式
人工智能·语言模型
XMAIPC_Robot1 小时前
基于RK3588 ARM+FPGA的电火花数控系统设计与测试(三)
运维·arm开发·人工智能·fpga开发·边缘计算
前端架构师1 小时前
我不是狐狸,我是那Harness Engineering
人工智能
俞凡1 小时前
CLAUDE.md 完全指南
人工智能
码路高手1 小时前
Trae-Agent中的设计模式应用
人工智能·架构