🚀 Day03 - NLP强化训练
📖 导读 :
第三天,强化训练,掌握更多细节。
🔄 核心知识点回顾
1. 分词与向量化
python
import jieba
from tensorflow.keras.preprocessing.text import Tokenizer
# 分词
words = jieba.lcut("自然语言处理")
# 向量化
tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)
X = tokenizer.texts_to_sequences(texts)
💪 深入RNN系列
2.1 双向RNN
python
bi_rnn = nn.RNN(input_size, hidden_size, num_layers=2, bidirectional=True)
output, hidden = bi_rnn(x, h0)
# output.shape: (seq, batch, hidden*2)
2.2 深层RNN
python
deep_rnn = nn.RNN(input_size, hidden_size, num_layers=3)
# 层数越多,梯度传播越困难
🎯 注意力机制详解
3.1 注意力公式
Attention(Q, K, V) = softmax(QK^T / √d) * V
3.2 矩阵乘法BMM
python
A = torch.randn(3, 4, 5)
B = torch.randn(3, 5, 6)
C = torch.bmm(A, B) # (3, 4, 6)
🔥 Transformer深化
4.1 多头注意力
python
class MultiHeadAttention(nn.Module):
def __init__(self, embed_size, heads):
super().__init__()
self.heads = heads
self.head_dim = embed_size // heads
self.W_q = nn.Linear(embed_size, embed_size)
self.W_k = nn.Linear(embed_size, embed_size)
self.W_v = nn.Linear(embed_size, embed_size)
def forward(self, q, k, v, mask=None):
# 分头 -> 注意力 -> 拼接
return output
4.2 位置编码
python
class PositionalEncoding(nn.Module):
def __init__(self, embed_size, max_len=5000):
super().__init__()
pe = torch.zeros(max_len, embed_size)
position = torch.arange(0, max_len).unsqueeze(1)
div_term = torch.exp(torch.arange(0, embed_size, 2) * (-math.log(10000.0) / embed_size))
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
self.register_buffer('pe', pe)
def forward(self, x):
return x + self.pe[:x.size(0)]
🚀 实战技巧
5.1 梯度裁剪
python
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
5.2 学习率调度
python
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.1)
📝 总结
Day03深入RNN系列和Transformer核心机制。