关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
克里斯蒂亚诺·罗纳尔达3 分钟前
智能体学习23——资源感知优化(Resource-Aware Optimization)
人工智能·学习
橙露29 分钟前
特征选择实战:方差、卡方、互信息法筛选有效特征
人工智能·深度学习·机器学习
TechMasterPlus1 小时前
LangGraph 实战指南:构建状态驱动的 LLM 应用架构
人工智能·架构
海森大数据1 小时前
数据与特征“协同进化”:机器学习加速发现高性能光合成过氧化氢COF催化剂
人工智能·机器学习
xiaotao1311 小时前
01-编程基础与数学基石: Python核心数据结构完全指南
数据结构·人工智能·windows·python
SteveSenna1 小时前
Trossen Arm MuJoCo自定义1:改变目标物体
人工智能·学习·算法·机器人
不熬夜的熬润之1 小时前
YOLOv5-OBB 训练避坑笔记
人工智能·yolo·计算机视觉
实证小助手1 小时前
世界各国经济政策不确定指数(1997-2024年)月度数据
大数据·人工智能
Wcowin2 小时前
Hermes Agent:自进化的 AI Agent
人工智能
努力学习_小白2 小时前
ResNet-50——pytorch版
人工智能·pytorch·python