关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
陈广亮10 分钟前
OpenClaw 多 Agent 配置实战:踩坑指南与最佳实践
人工智能
GHL28427109011 分钟前
TensorFlow学习
人工智能·学习
阿杰学AI11 分钟前
AI核心知识100——大语言模型之 LM Arena(简洁且通俗易懂版)
人工智能·ai·语言模型·自然语言处理·aigc·模型评测·lm arena
小刘的大模型笔记16 分钟前
大模型微调实战——从数据准备到落地部署全流程
人工智能
技术狂人16820 分钟前
告别“复读机“AI:用Agent Skills打造你的专属编程副驾
人工智能·职场和发展·agent·skills
龙山云仓22 分钟前
No152:AI中国故事-对话祖冲之——圆周率与AI精度:数学直觉与极限探索
大数据·开发语言·人工智能·python·机器学习
陈广亮27 分钟前
OpenClaw 入门实战:5分钟搭建你的自托管 AI 助手
人工智能
琅琊榜首202027 分钟前
AI+Python实操指南:用编程赋能高质量网络小说创作
开发语言·人工智能·python
JinchuanMaster33 分钟前
Ubuntu20.04安装50系显卡驱动[不黑屏版本]
linux·人工智能·深度学习·ubuntu·机器学习·机器人·gpu算力
草莓熊Lotso35 分钟前
Linux 程序地址空间深度解析:虚拟地址背后的真相
java·linux·运维·服务器·开发语言·c++·人工智能