关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
大大花猫3 分钟前
为了重温儿时回忆,我用AI做了一个小游戏合集APP【附源码】
人工智能·ai编程·游戏开发
万粉变现经纪人14 分钟前
如何解决pip安装报错ModuleNotFoundError: No module named ‘transformers’问题
人工智能·python·beautifulsoup·pandas·scikit-learn·pip·ipython
cver12323 分钟前
塑料可回收物检测数据集-10,000 张图片 智能垃圾分类系统 环保回收自动化 智慧城市环卫管理 企业环保合规检测 教育环保宣传 供应链包装优化
人工智能·安全·计算机视觉·目标跟踪·分类·自动化·智慧城市
jz_ddk29 分钟前
[科普] AI加速器架构全景图:从GPU到光计算的算力革命
人工智能·学习·算法·架构
idaretobe35 分钟前
宝龙地产债务化解解决方案二:基于资产代币化与轻资产转型的战略重构
人工智能·web3·去中心化·区块链·智能合约·信任链
摆烂工程师43 分钟前
教你如何从GPT-5 切换到 GPT-4o。Plus 用户切换 GPT-4o 旧模型的入口在哪里?
人工智能·chatgpt·程序员
Lee_Serena1 小时前
bert学习
人工智能·深度学习·自然语言处理·bert·transformer
仪器科学与传感技术博士1 小时前
Matplotlib库:Python数据可视化的基石,发现它的美
开发语言·人工智能·python·算法·信息可视化·matplotlib·图表可视化
小王爱学人工智能1 小时前
svm的一些应用
人工智能·机器学习·支持向量机
极限实验室1 小时前
喜报!极限科技 Coco AI 荣获 2025 首届人工智能应用创新大赛全国一等奖
人工智能