关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
SmartRadio1 分钟前
ESP32-S3对接豆包制作AI桌面数字收音机,桌面闹钟,桌面新闻播报器
人工智能·esp32·远程·虚拟键盘·虚拟鼠标
主机哥哥4 分钟前
阿里云OpenClaw极简部署教程,打造专属AI助手!
人工智能·阿里云·云计算
AI营销快线4 分钟前
决胜2026:原圈科技AI CRM系统如何领跑汽车销服一体化变革?
人工智能
qwy7152292581638 分钟前
13-图像的透视
人工智能·opencv·计算机视觉
光羽隹衡9 分钟前
计算机视觉——Opencv(图像直方图与掩膜)
人工智能·opencv·计算机视觉
KG_LLM图谱增强大模型10 分钟前
a16z 最新AI市场状态报告:独角兽崛起与科技超级周期
人工智能
xixixi7777713 分钟前
Prompt脱敏——不损失(或尽量少损失)原文本语义和上下文价值的前提下,防止原始敏感数据暴露给模型服务方、潜在的攻击者或出现在模型训练数据中
人工智能·microsoft·ai·大模型·数据安全·提示词·敏感信息
凡泰极客科技13 分钟前
新浪财经专访凡泰极客梁启鸿:金融App的AI落地应避哪些坑
人工智能·金融
量子-Alex15 分钟前
【大模型技术报告】Qwen2-VL技术报告解读
人工智能
得赢科技17 分钟前
2026年料汁定制公司深度评测报告
人工智能