深度学习实战（基于pytroch）系列（四十三）深度循环神经网络pytorch实现

深度循环神经网络pytorch实现

- 架构概述
- 数学定义
- - 输入与输出定义
  - 第1隐藏层计算
  - [第 l l l隐藏层计算（ l ≥ 2 l \geq 2 l≥2）](#第 l l l隐藏层计算（ l ≥ 2 l \geq 2 l≥2）)
  - 输出层计算
- 扩展说明
- PyTorch实现
- 小结
- 本系列目录链接

架构概述

前述的循环神经网络仅包含单向单隐藏层，在实际深度学习应用中，我们通常使用含多个隐藏层的循环神经网络，称为深度循环神经网络。图6.11展示了一个具有L个隐藏层的深度循环神经网络架构，其中每个隐藏状态同时向两个方向传递：

当前层的下一时间步（时间维度）
当前时间步的下一层（深度维度）

数学定义

输入与输出定义

在时间步 t t t 中，设小批量输入为 X t ∈ R n × d \boldsymbol{X}_t \in \mathbb{R}^{n \times d} Xt∈Rn×d（样本数为 n n n，输入维度为 d d d），第 l l l 隐藏层（ l = 1 , 2 , ... , L l = 1, 2, \ldots, L l=1,2,...,L）的隐藏状态为 H t ( l ) ∈ R n × h \boldsymbol{H}_t^{(l)} \in \mathbb{R}^{n \times h} Ht(l)∈Rn×h（隐藏单元个数为 h h h），输出层变量为 O t ∈ R n × q \boldsymbol{O}_t \in \mathbb{R}^{n \times q} Ot∈Rn×q（输出个数为 q q q），隐藏层激活函数为 ϕ \phi ϕ。

第1隐藏层计算

第1隐藏层的隐藏状态计算与传统循环神经网络一致：

H t ( 1 ) = ϕ ( X t W x h ( 1 ) + H t − 1 ( 1 ) W h h ( 1 ) + b h ( 1 ) ) \boldsymbol{H}t^{(1)} = \phi(\boldsymbol{X}t \boldsymbol{W}{xh}^{(1)} + \boldsymbol{H}{t-1}^{(1)} \boldsymbol{W}_{hh}^{(1)} + \boldsymbol{b}_h^{(1)}) Ht(1)=ϕ(XtWxh(1)+Ht−1(1)Whh(1)+bh(1))

其中模型参数包括：

W x h ( 1 ) ∈ R d × h \boldsymbol{W}_{xh}^{(1)} \in \mathbb{R}^{d \times h} Wxh(1)∈Rd×h：输入到第1隐藏层的权重矩阵
W h h ( 1 ) ∈ R h × h \boldsymbol{W}_{hh}^{(1)} \in \mathbb{R}^{h \times h} Whh(1)∈Rh×h：第1隐藏层循环权重矩阵
b h ( 1 ) ∈ R 1 × h \boldsymbol{b}_h^{(1)} \in \mathbb{R}^{1 \times h} bh(1)∈R1×h：第1隐藏层偏置向量

第 l l l隐藏层计算（ l ≥ 2 l \geq 2 l≥2）

对于第 l l l 隐藏层（ l ≥ 2 l \geq 2 l≥2），隐藏状态计算基于前一层的输出：

H t ( l ) = ϕ ( H t ( l − 1 ) W x h ( l ) + H t − 1 ( l ) W h h ( l ) + b h ( l ) ) \boldsymbol{H}t^{(l)} = \phi(\boldsymbol{H}t^{(l-1)} \boldsymbol{W}{xh}^{(l)} + \boldsymbol{H}{t-1}^{(l)} \boldsymbol{W}_{hh}^{(l)} + \boldsymbol{b}_h^{(l)}) Ht(l)=ϕ(Ht(l−1)Wxh(l)+Ht−1(l)Whh(l)+bh(l))

其中模型参数包括：

W x h ( l ) ∈ R h × h \boldsymbol{W}_{xh}^{(l)} \in \mathbb{R}^{h \times h} Wxh(l)∈Rh×h：第 l − 1 l-1 l−1层到第 l l l层的权重矩阵
W h h ( l ) ∈ R h × h \boldsymbol{W}_{hh}^{(l)} \in \mathbb{R}^{h \times h} Whh(l)∈Rh×h：第 l l l层循环权重矩阵
b h ( l ) ∈ R 1 × h \boldsymbol{b}_h^{(l)} \in \mathbb{R}^{1 \times h} bh(l)∈R1×h：第 l l l层偏置向量

输出层计算

最终输出基于最后一层隐藏层的隐藏状态：

O t = H t ( L ) W h q + b q \boldsymbol{O}_t = \boldsymbol{H}t^{(L)} \boldsymbol{W}{hq} + \boldsymbol{b}_q Ot=Ht(L)Whq+bq

其中输出层参数包括：

W h q ∈ R h × q \boldsymbol{W}_{hq} \in \mathbb{R}^{h \times q} Whq∈Rh×q：隐藏层到输出层的权重矩阵
b q ∈ R 1 × q \boldsymbol{b}_q \in \mathbb{R}^{1 \times q} bq∈R1×q：输出层偏置向量

扩展说明

超参数选择 ：隐藏层个数 L L L 和隐藏单元个数 h h h 是需要调节的超参数
门控机制扩展：将基础循环单元替换为门控循环单元（GRU）或长短期记忆（LSTM）单元，即可构建深度门控循环神经网络
初始化策略 ：各层初始隐藏状态 H 0 ( l ) \boldsymbol{H}_0^{(l)} H0(l) 通常初始化为零向量

PyTorch实现

python 复制代码

import torch
import torch.nn as nn

class DeepRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, 
                 rnn_type='rnn', activation='tanh', dropout=0.0):
        """
        深度循环神经网络
        
        Args:
            input_size: 输入特征维度 d
            hidden_size: 隐藏单元个数 h
            output_size: 输出维度 q
            num_layers: 隐藏层数量 L
            rnn_type: RNN类型 ('rnn', 'gru', 'lstm')
            activation: 激活函数 ('tanh', 'relu')，仅对RNN有效
            dropout: 层间dropout概率
        """
        super(DeepRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # 选择RNN类型
        if rnn_type.lower() == 'rnn':
            self.rnn = nn.RNN(
                input_size=input_size,
                hidden_size=hidden_size,
                num_layers=num_layers,
                batch_first=False,  # (seq_len, batch, input_size)
                nonlinearity=activation,
                dropout=dropout
            )
        elif rnn_type.lower() == 'gru':
            self.rnn = nn.GRU(
                input_size=input_size,
                hidden_size=hidden_size,
                num_layers=num_layers,
                batch_first=False,
                dropout=dropout
            )
        elif rnn_type.lower() == 'lstm':
            self.rnn = nn.LSTM(
                input_size=input_size,
                hidden_size=hidden_size,
                num_layers=num_layers,
                batch_first=False,
                dropout=dropout
            )
        else:
            raise ValueError("rnn_type must be 'rnn', 'gru', or 'lstm'")
        
        # 输出层
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x, hidden=None):
        """
        前向传播
        
        Args:
            x: 输入序列, shape: (seq_len, batch_size, input_size)
            hidden: 初始隐藏状态，如果为None则自动初始化
            
        Returns:
            output: 输出序列, shape: (seq_len, batch_size, output_size)
            hidden: 最终隐藏状态
        """
        # RNN前向传播
        rnn_output, hidden = self.rnn(x, hidden)  # (seq_len, batch_size, hidden_size)
        
        # 全连接层
        output = self.fc(rnn_output)  # (seq_len, batch_size, output_size)
        
        return output, hidden
    
    def init_hidden(self, batch_size, device=None):
        """初始化隐藏状态"""
        if device is None:
            device = next(self.parameters()).device
            
        if isinstance(self.rnn, nn.LSTM):
            # LSTM需要隐藏状态和细胞状态
            h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)
            c0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)
            return (h0, c0)
        else:
            # RNN和GRU只需要隐藏状态
            return torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)


# 手动实现版本（更贴近数学公式）
class ManualDeepRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, activation='tanh'):
        """
        手动实现的深度循环神经网络
        
        Args:
            input_size: 输入特征维度 d
            hidden_size: 隐藏单元个数 h
            output_size: 输出维度 q
            num_layers: 隐藏层数量 L
            activation: 激活函数
        """
        super(ManualDeepRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # 选择激活函数
        if activation == 'tanh':
            self.phi = torch.tanh
        elif activation == 'relu':
            self.phi = torch.relu
        else:
            raise ValueError("激活函数支持 'tanh' 或 'relu'")
        
        # 创建各层参数
        self.layers = nn.ModuleList()
        
        # 第1层（输入到隐藏）
        self.layers.append(nn.RNNCell(input_size, hidden_size, nonlinearity=activation))
        
        # 中间层（隐藏到隐藏）
        for _ in range(1, num_layers):
            self.layers.append(nn.RNNCell(hidden_size, hidden_size, nonlinearity=activation))
        
        # 输出层
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        """
        前向传播
        
        Args:
            x: 输入序列, shape: (seq_len, batch_size, input_size)
            
        Returns:
            outputs: 输出序列, shape: (seq_len, batch_size, output_size)
        """
        seq_len, batch_size, _ = x.shape
        
        # 初始化各层隐藏状态
        hidden_states = [torch.zeros(batch_size, self.hidden_size).to(x.device) 
                        for _ in range(self.num_layers)]
        
        outputs = []
        
        for t in range(seq_len):
            # 第1层处理
            hidden_states[0] = self.layers[0](x[t], hidden_states[0])
            
            # 后续层处理
            for layer in range(1, self.num_layers):
                hidden_states[layer] = self.layers[layer](
                    hidden_states[layer-1], hidden_states[layer]
                )
            
            # 输出层
            output_t = self.fc(hidden_states[-1])
            outputs.append(output_t)
        
        return torch.stack(outputs)


# 测试代码
def test_deep_rnn():
    # 参数设置
    input_size = 10
    hidden_size = 32
    output_size = 5
    num_layers = 3
    batch_size = 4
    seq_len = 8
    
    # 创建模型
    model = DeepRNN(input_size, hidden_size, output_size, num_layers, rnn_type='rnn')
    
    # 创建测试数据
    x = torch.randn(seq_len, batch_size, input_size)
    
    # 前向传播
    output, hidden = model(x)
    
    print(f"输入形状: {x.shape}")
    print(f"输出形状: {output.shape}")
    print(f"隐藏状态形状: {hidden.shape if not isinstance(hidden, tuple) else hidden[0].shape}")
    
    # 验证输出形状
    assert output.shape == (seq_len, batch_size, output_size)
    print("测试通过！")


if __name__ == "__main__":
    test_deep_rnn()

小结

深度循环神经网络通过堆叠多个隐藏层来增强模型的表达能力
每个隐藏状态同时向时间维度（下一时间步）和深度维度（下一层）传递信息
可通过替换基础循环单元为GRU或LSTM来构建更强大的深度门控循环神经网络
隐藏层数量L和隐藏单元数量h是需要精心调节的重要超参数

本系列目录链接