简介
循环神经网络(Recurrent Neural Network, RNN)是一类以序列(sequence)数据为输入,在序列的演进方向进行递归(recursion)且所有节点(循环单元)按链式连接的递归神经网络(recursive neural network) [1] 。
对循环神经网络的研究始于二十世纪80-90年代,并在二十一世纪初发展为深度学习(deep learning)算法之一 [2] ,其中双向循环神经网络(Bidirectional RNN, Bi-RNN)和长短期记忆网络(Long Short-Term Memory networks,LSTM)是常见的循环神经网络 [3] 。
目录:
- 模型
- Forward
- Backward
- nn.RNN
- nn.RNNCell
一 模型
data:image/s3,"s3://crabby-images/0b36f/0b36f849957d2ee5be05e232e6eed638dc9aad25" alt=""
: t 时刻样本输入
: t 时刻样本隐藏状态
t时刻输出
: t时刻样本预测类别(只有分类算法才有)
: t 时刻损失函数
二 RNN 前向传播算法 Forward
2.1 t 时刻隐藏值 更新
data:image/s3,"s3://crabby-images/97ba4/97ba40231d75a6d1f19359fedcd69d112decf635" alt=""
data:image/s3,"s3://crabby-images/801e1/801e15d791fd87972fa9bb692dcd128341d52ed2" alt=""
其中激活函数通常用tanh
2.2 t 时刻输出
data:image/s3,"s3://crabby-images/70b31/70b31f058ce399594a25140cf3d2c782c975303c" alt=""
data:image/s3,"s3://crabby-images/cd9a3/cd9a33494d740d9102ed30d6ccc66daf19d9d40e" alt=""
其中激活函数 为softmax
三 RNN 反向传播算法 BPTT(back-propagation through time)
3.1 输出层参数v,c梯度
data:image/s3,"s3://crabby-images/da689/da6898890d2a86cdb7db2e6cfe0ff1e7c3bb7511" alt=""
data:image/s3,"s3://crabby-images/d5480/d548026fd80f43b9be9c5c4329d5843e562f28f1" alt=""
3.2 隐藏层参数更新
定义
data:image/s3,"s3://crabby-images/1cfa7/1cfa71d3382b900cc7c7633197f6c2dead890e56" alt=""
data:image/s3,"s3://crabby-images/fc02b/fc02b18afa1c1c21b052098f8e122ca20924b4f7" alt=""
证明:
data:image/s3,"s3://crabby-images/e70dd/e70dded28f7940ada567e9c7ce5078a5a33608da" alt=""
data:image/s3,"s3://crabby-images/05c21/05c2108ae752fab22fd4bff825079dd16a5deb0f" alt=""
data:image/s3,"s3://crabby-images/a26da/a26dab53f3d8c53930bf4d1c2a6f4311f5cc92a2" alt=""
data:image/s3,"s3://crabby-images/50379/503793abcb74f26bd2606da68ddaf2b889cf4a9e" alt=""
对于最后一个时刻T
data:image/s3,"s3://crabby-images/44111/44111348ed08d8adc0bed4ad7e856f9247364ed7" alt=""
3.3 计算权重系数U,W,b
data:image/s3,"s3://crabby-images/2d852/2d852d05b156e0501a0409ee52c4b9b40942ac30" alt=""
data:image/s3,"s3://crabby-images/2f8e3/2f8e342505f7e528bf38414c31ac3d56faa34397" alt=""
data:image/s3,"s3://crabby-images/d6236/d62366e16890f921d12d89cc5f6f81ce18afe158" alt=""
四 nn.RNN
这里面介绍PyTorch 使用RNN 类
data:image/s3,"s3://crabby-images/38f1f/38f1f1681ddb48a2c178e3e52281f45c774d1745" alt=""
4.1 更新规则:
data:image/s3,"s3://crabby-images/c5ae4/c5ae4007e78490072363ebe6350ed67da80843f7" alt=""
data:image/s3,"s3://crabby-images/63430/63430fed66a37504402508009a178e989bf9301e" alt=""
|----|--------------------|
| 参数 | 说明 |
| L | 时间序列长度T or 句子长度为 L |
| N | batch_size |
| d | 输入特征维度 |
# -*- coding: utf-8 -*-
"""
Created on Wed Jul 19 15:30:01 2023
@author: chengxf2
"""
import torch
import torch.nn as nn
rnn = nn.RNN(input_size=100, hidden_size=5)
param = rnn._parameters
print("\n 权重系数",param.keys())
print(rnn.weight_ih_l0.shape)
输出:
data:image/s3,"s3://crabby-images/87acc/87acc88a7fcf6f9ec4610405090f4b23cbd3ed52" alt=""
RNN参数说明:
|----------------|-------------------------------------------------------------------------------------------------|
| 参数 | 说明 |
| input_size =d | 输入维度 |
| hidden_size=h: | 隐藏层维度 |
| num_layers | RNN默认是 1 层。该参数大于 1 时,会形成 Stacked RNN,又称多层RNN或深度RNN; |
| nonlinearity | 非线性激活函数。可以选择 tanh 或 relu |
| bias | 即偏置。默认启用 |
| batch_first | 选择让 batch_size=N 作为输入的形状中的第一个参数**。默认是 False,L × N × d 形状**; 当 batch_first=True 时, N × L × d |
| dropout | 即是否启用 dropout。如要启用,则应设置 dropout 的概率,此时除最后一层外,RNN的每一层后面都会加上一个dropout层。默认是 0,即不启用 |
| bidirectional | 即是否启用双向RNN,默认关闭 |
4.2 单层例子
import torch.nn as nn
import torch
rnn = nn.RNN(input_size= 100, hidden_size=20, num_layers=1)
X = torch.randn(10,3,100)
h_0 = torch.zeros(1,3,20)
out,h = rnn(X,h_0)
print("\n out.shape",out.shape)
print("\n h.shape",h.shape)
out: 包含每个时刻的 隐藏值
h : 最后一个时刻的隐藏值
4.3 多层RNN
data:image/s3,"s3://crabby-images/8366e/8366ef3e0b4aae27837e082c9e041e2c3fd21246" alt=""
把当前的隐藏层输出,作为下一层的输入
第一个隐藏层输出:
data:image/s3,"s3://crabby-images/213b8/213b84f9458342f7929cadc4366750f694454ad7" alt=""
第二个隐藏层输出
data:image/s3,"s3://crabby-images/3359d/3359d6b6d7d7ee5bee024fa6ba3dc2ed802d291b" alt=""
# -*- coding: utf-8 -*-
"""
Created on Mon Jul 24 11:43:30 2023
@author: chengxf2
"""
import torch.nn as nn
import torch
rnn = nn.RNN(input_size=100, hidden_size=20, num_layers=2)
print(rnn)
x = torch.randn(10,3,100) #默认是[L,N,d]结构
out,h =rnn(x)
print(out.shape, h.shape)
data:image/s3,"s3://crabby-images/6c04c/6c04cf26be367e51b5be2b9258972bef04b4b4e7" alt=""
5 nn.RNNCell
nn.RNN封装了整个RNN实现的过程, PyTorch 还提供了 nn.RNNCell 可以
自己实现RNN
data:image/s3,"s3://crabby-images/fdb31/fdb31e51c6b6868bbdf2f13155ec64734c0b53bf" alt=""
data:image/s3,"s3://crabby-images/0d23c/0d23c6f2963422c7ad61de7c6e79bb95a0066996" alt=""
data:image/s3,"s3://crabby-images/8c960/8c9606238e19e5a8ead63ade5c53d12986173e6f" alt=""
data:image/s3,"s3://crabby-images/aa0fe/aa0fe3fa9110df1e94b5b0cb6719ef854afa06de" alt=""
5.1 单层RNN
# -*- coding: utf-8 -*-
"""
Created on Mon Jul 24 11:43:30 2023
@author: chengxf2
"""
import torch
from torch import nn
def main():
model = nn.RNNCell(input_size=10, hidden_size=20)
h1= torch.zeros(3,20)
trainData = torch.randn(8,3,10)
for xt in trainData:
h1= model(xt,h1)
print(h1.shape)
if __name__ == "__main__":
main()
data:image/s3,"s3://crabby-images/6ecd5/6ecd504bfad7d8dc122ad2fc16b858cccce447e6" alt=""
6.2 多层RNN
# -*- coding: utf-8 -*-
"""
Created on Mon Jul 24 11:43:30 2023
@author: chengxf2
"""
import torch
from torch import nn
def main():
layer1 = nn.RNNCell(input_size=40, hidden_size=30)
layer2 = nn.RNNCell(input_size=30, hidden_size=20)
h1= torch.zeros(3,30)
h2= torch.zeros(3,20)
trainData = torch.randn(8,3,40)
for xt in trainData:
h1= layer1(xt,h1)
h2 = layer2(h1,h2)
print(h1.shape)
print(h2.shape)
if __name__ == "__main__":
main()
data:image/s3,"s3://crabby-images/42d76/42d765929343c3e8ffd8810cd23e59171b4298c3" alt=""
参考:
Pytorch 循环神经网络 nn.RNN() nn.RNNCell() nn.Parameter()不同方法实现_老光头_ME2CS的博客-CSDN博客