关于Transformer的理解

关于Transformer, QKV的意义表示其更像是一个可学习的查询系统,或许以前搜索引擎的算法就与此有关或者某个分支的搜索算法与此类似。


Can anyone help me to understand this image? - #2 by J_Johnson - nlp - PyTorch Forums

Embeddings - these are learnable weights where each token(token could be a word, sentence piece, subword, character, etc) are converted into a vector, say, with 500 values between 0 and 1 that are trainable.

Positional Encoding - for each token, we want to inform the model where it's located, orderwise. This is because linear layers are not ideal for handling sequential information. So we manually pass this in by adding a vector of sine and cosine values on the first 2 elements in the embedding vector.

This sequence of vectors goes through an attention layer, which basically is like a learnable digitized database search function with keys, queries and values. In this case, we are "searching" for the most likely next token.

The Feed Forward is just a basic linear layer, but is applied across each embedding in the sequence separately(i.e. 3 dim tensor instead of 2 dim).

Then the final Linear layer is where we want to get out our predicted next token in the form of a vector of probabilities, which we apply a softmax to put the values in the range of 0 to 1.

There are two sides because when that diagram was developed, it was being used in language translations. But generative language models for next token prediction just use the Transformer decoder and not the encoder.

Here is a PyTorch tutorial that might help you go through how it works.

Language Modeling with nn.Transformer and torchtext --- PyTorch Tutorials 2.0.1+cu117 documentation


相关推荐
月岛雫-5 分钟前
“单标签/多标签” vs “二分类/多分类”
人工智能·分类·数据挖掘
似乎很简单9 分钟前
卷积神经网络(CNN)
深度学习·神经网络·cnn
云卓SKYDROID20 分钟前
无人机飞行速度模块技术要点概述
人工智能·无人机·飞行速度·高科技·云卓科技
币须赢1 小时前
英伟达Thor芯片套件9月发货 “物理AI”有哪些?
大数据·人工智能
盼小辉丶1 小时前
Transformer实战(18)——微调Transformer语言模型进行回归分析
深度学习·语言模型·回归·transformer
格林威1 小时前
机器视觉检测如何使用360 度全景成像镜头进行AI 瑕疵检测
人工智能·深度学习·数码相机·机器学习·计算机视觉·视觉检测·相机
互联网之声1 小时前
崔传波教授:以科技与人文之光,点亮近视患者的清晰视界‌
人工智能
lily363926046a1 小时前
智联未来 点赋科技
大数据·人工智能
聚客AI1 小时前
🍬传统工程师转型:智能体架构师的技能图谱
人工智能·agent·mcp