关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
攻城狮7号1 天前
通用 GUI 智能体基座 MAI-UI 开源:告别“人工智障”?
人工智能·mai-ui·tongyi-mai·阿里通义实验室·gui智能体
寻星探路1 天前
【深度长文】深入理解网络原理:TCP/IP 协议栈核心实战与性能调优
java·网络·人工智能·python·网络协议·tcp/ip·ai
轻竹办公PPT1 天前
实测多款 AI:2026 年工作计划 PPT 哪种更好修改
人工智能·python·powerpoint
AIHubPro未来百科1 天前
三天用AI开发完成开源WordPress导航主题:要哇棱镜主题详解 + 完整部署教程
人工智能·开源
执笔论英雄1 天前
【RL】advantages 与 ratio之间的关系
人工智能
切糕师学AI1 天前
AI 领域中的 Prompt(提示词/提示)是什么?
人工智能·prompt
HZZD_HZZD1 天前
喜讯|合众致达成功中标宁夏宝丰集团水电表计量结算管理平台项目
大数据·人工智能
AI_56781 天前
基于职业发展的Python与Java深度对比分析
java·人工智能·python·信息可视化
凤希AI伴侣1 天前
告别文件存储的混乱:我用SQLite重构了AI对话记录管理
人工智能·重构·sqlite·凤希ai伴侣
工藤学编程1 天前
零基础学AI大模型之LangChain智能体之initialize_agent开发实战
人工智能·langchain