关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
Dfreedom.3 分钟前
在Windows上搭建GPU版本PyTorch运行环境的详细步骤
c++·人工智能·pytorch·python·深度学习
confiself11 分钟前
AndroidWorld+mobileRL
人工智能·深度学习
aneasystone本尊20 分钟前
学习 Chat2Graph 的任务分解与执行
人工智能
嘀咕博客22 分钟前
10Web-AI网站生成器
人工智能·ai工具
西柚小萌新27 分钟前
【从零开始的大模型原理与实践教程】--第一章:NLP基础概念
人工智能·自然语言处理
程序员奈斯31 分钟前
Python深度学习:NumPy数组库
python·深度学习·numpy
嘀咕博客33 分钟前
SafeEar:浙大和清华联合推出的AI音频伪造检测框架,错误率低至2.02%
人工智能·音视频·ai工具
Hello123网站33 分钟前
FinChat-金融领域的ChatGPT
人工智能·chatgpt·金融·ai工具
嘀咕博客39 分钟前
PixVerse -免费在线AI视频生成工具
人工智能·音视频·ai工具
CoovallyAIHub40 分钟前
CostFilter-AD:用“匹配代价过滤”刷新工业质检异常检测新高度! (附论文和源码)
深度学习·算法·计算机视觉