关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
个入资料4 小时前
阿里云ecs+飞书搭建openclaw
人工智能
CoovallyAIHub4 小时前
OpenClaw一脚踩碎传统CV?机器终于不再只是看世界
深度学习·算法·计算机视觉
CoovallyAIHub5 小时前
仅凭单目相机实现3D锥桶定位?UNet-RKNet破解自动驾驶锥桶检测难题
深度学习·算法·计算机视觉
孤烟5 小时前
【RAG 实战系列 02】检索精度翻倍!混合检索(稀疏 + 稠密)实战教程
人工智能·llm
明明如月学长6 小时前
OpenClaw 帮我睡后全自动完成了老板交代的任务
人工智能
uuware6 小时前
Lupine.Press + AI 助您分分钟搞定技术项目的文档网站
人工智能·前端框架
海上日出6 小时前
使用 QuantStats 进行投资组合绩效分析:Python 量化实战指南
人工智能
Qinana6 小时前
150行代码搞定私有知识库!Node.js + LangChain 打造最小化 RAG 系统全流程
人工智能·程序员·node.js
猿猿长成记6 小时前
AI专栏 | AI大法则之思维链、自洽性、思维树
人工智能
用户5191495848456 小时前
CrushFTP 条件竞争认证绕过漏洞利用工具 (CVE-2025-54309)
人工智能·aigc