关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
千匠网络2 分钟前
S2B供应链平台:优化资源配置,推动产业升级
大数据·人工智能·产品运营·供应链·s2b
JERRY. LIU14 分钟前
大脑各组织类型及其电磁特性
人工智能·神经网络·计算机视觉
吐个泡泡v23 分钟前
深度学习中的“压缩与解压“艺术:自编码器与VAE详解
深度学习·vae·生成模型·自编码器
l木本I29 分钟前
uv 技术详解
人工智能·python·深度学习·机器学习·uv
通义灵码35 分钟前
在 IDEA 里用 AI 写完两个 Java 全栈功能,花了 7 分钟
人工智能·ai编程·qoder
TracyCoder12336 分钟前
机器学习与深度学习基础(五):深度神经网络经典架构简介
深度学习·机器学习·dnn
AI营销快线39 分钟前
AI如何每日自动生成大量高质量营销素材?
大数据·人工智能
元智启41 分钟前
企业 AI 智能体:零代码落地指南与多场景实操案例
人工智能
xiaoxiaoxiaolll1 小时前
智能计算模拟:第一性原理+分子动力学+机器学习
人工智能·机器学习