机器学习(10.7-10.13)(Pytorch LSTM和LSTMP的原理及其手写复现)

文章目录

    • 摘要
    • Abstract
    • [1 LSTM](#1 LSTM)
    • [1.1 使用Pytorch LSTM](#1.1 使用Pytorch LSTM)
      • [1.1.1 LSTM API代码实现](#1.1.1 LSTM API代码实现)
      • [1.1.2 LSTMP代码实现](#1.1.2 LSTMP代码实现)
    • [1.2 手写一个lstm_forward函数 实现单向LSTM的计算原理](#1.2 手写一个lstm_forward函数 实现单向LSTM的计算原理)
    • [1.3 手写一个lstmp_forward函数 实现单向LSTMP的计算原理](#1.3 手写一个lstmp_forward函数 实现单向LSTMP的计算原理)
    • 总结

摘要

LSTM是RNN的一个优秀的变种模型,继承了大部分RNN模型的特性;在本次学习中,展示了LSTM的手动推导过程,用代码逐行模拟实现LSTM的运算过程,并与Pytorch API输出的结验证是否保持一致。

Abstract

LSTM is an excellent variant model of RNN, inheriting most of the characteristics of RNN model. In this study, the manual derivation process of LSTM is demonstrated, and the operation process of LSTM is simulated line by line with the code, and whether it is consistent with the junction output of Pytorch API is verified.

1 LSTM

1.1 使用Pytorch LSTM

实例化LSTM类,需要传递的参数:

input_size :输入数据的特征维度
hidden_size :隐含状态 h t h_t ht的大小
num_layers :默认值为1,大于1,表示多个RNN堆叠起来
batch_first :默认是False;若为True,输入输出格式为:(batch, seq, feature) ;若为False:输入输出格式为: (seq, batch, feature);
bidirectional :默认为False;若为True,则是双向RNN,同时输出长度为2*hidden_size
proj_size:若不为None,将使用具有相应大小投影的 LSTM

函数输入值:(input,(h_0,c_0))

  1. input:当batch_first=True 输入格式为(N,L, H i n H_{in} Hin);当batch_first=False 输入格式为(L,N, H i n H_{in} Hin);
  2. h_0:默认输入值为0;输入格式为:(D*num_layers,N, H o u t H_{out} Hout)
  3. c_0:默认输入值为0,输入格式为:(D*num_layers,N, H c e l l H_{cell} Hcell)

其中:

N = batch size

L = sequence length

D = 2 if bidirectional=True otherwise 1
H i n H_{in} Hin = input_size
H o u t H_{out} Hout = hidden_size
H c e l l H_{cell} Hcell = proj_size if proj_size>0 否则 hidden_size

函数输出值:(output,(h_n, c_n))

  1. output:当batch_first=True 输出格式为(N,L,D* H o u t H_{out} Hout);当batch_first=False 输出格式为(L,N,D* H o u t H_{out} Hout);
  2. h_n:输出格式为:(D*num_layers,N, H o u t H_{out} Hout)
  3. c_n:输出格式为:(D*num_layers,N, H c e l l H_{cell} Hcell)

1.1.1 LSTM API代码实现

首先实例化一些参数:

python 复制代码
# 定义常量
bs, T, input_size, hidden_size = 2, 3, 4, 5

#输入序列
input = torch.randn(bs, T, input_size)
c0 = torch.randn(bs, hidden_size)
h0 = torch.randn(bs, hidden_size)

调用Pytorch中的LSTM API:

并查看返回的结果及其形状

python 复制代码
# 调用官方的LSTM API
lstm_layer = nn.LSTM(input_size, hidden_size, batch_first=True)
output, (hn, cn) = lstm_layer(input, (h0.unsqueeze(0), c0.unsqueeze(0)))

print("----官方LSTM的输出----")
print(output)
print(output.shape) # torch.Size([2, 3, 5])---[bs, T, hidden_size]
print(hn)
print(hn.shape) # torch.Size([1, 2, 5])---[num_layers,bs,hidden_size]
print(cn)
print(cn.shape) # torch.Size([1, 2, 5])---[num_layers,bs,hidden_size]

输出为:

python 复制代码
----官方LSTM的输出----
tensor([[[ 0.3045, -0.0770,  0.0110, -0.1592, -0.2298],
         [ 0.1142, -0.1522, -0.0975, -0.1512,  0.2834],
         [ 0.1298,  0.0090,  0.0183, -0.2742,  0.0428]],

        [[ 0.1908, -0.1505, -0.0659,  0.0520,  0.0340],
         [ 0.2583, -0.0599, -0.0442, -0.1428, -0.0102],
         [ 0.1604,  0.0004, -0.1105, -0.2062,  0.0760]]],
       grad_fn=<TransposeBackward0>)
torch.Size([2, 3, 5])
tensor([[[ 0.1298,  0.0090,  0.0183, -0.2742,  0.0428],
         [ 0.1604,  0.0004, -0.1105, -0.2062,  0.0760]]],
       grad_fn=<StackBackward0>)
torch.Size([1, 2, 5])
tensor([[[ 0.3637,  0.0263,  0.0326, -0.5326,  0.0537],
         [ 0.4828,  0.0008, -0.2197, -0.3999,  0.1016]]],
       grad_fn=<StackBackward0>)
torch.Size([1, 2, 5])

获取 pytorch LSTM内置参数的形状

python 复制代码
for k, v in lstm_layer.named_parameters():
    print(k, v)

其中:

python 复制代码
weight_ih_l0维度为:[4*hidden_size,input_size]
weight_hh_l0维度为:[4*hidden_size,hidden_size]
bias_ih_l0维度为:[4*hidden_size]
bias_hh_l0维度为:[4*hidden_size]

1.1.2 LSTMP代码实现

如果指定proj_size >0 ,将会使用带投影的LSTM,以下列方式更改LSTM单元。

首先, h t h_t ht的维度将从hidden_size更改为:proj_size

其次,每一层的输出隐藏状态将乘以一个可学习的投影矩阵, h t = w h r h t h_t=w_{hr}h_t ht=whrht

因此,LSTMP 网络的输出也将具有不同的形状。

首先实例化一些参数:

python 复制代码
bs, T, input_size, hidden_size = 2, 3, 4, 5
proj_size = 3

input = torch.randn(bs, T, input_size)
h0 = torch.randn(bs, proj_size)
c0 = torch.randn(bs, hidden_size)

调用PyTorch中的 LSTMP API:

并查看返回的结果及其形状

python 复制代码
# 使用官方的LSTMP
layer_LSTMP = nn.LSTM(input_size, hidden_size, batch_first=True, proj_size=proj_size)
output, (h_n, c_n) = layer_LSTMP(input, (h0.unsqueeze(0), c0.unsqueeze(0)))
print(output)
print(output.shape) # torch.Size([2, 3, 3])---[bs,T,proj_size]
print(h_n)
print(h_n.shape)    # torch.Size([1, 2, 3])---[num_layers,bs,proj_size]
print(c_n)
print(c_n.shape)    # torch.Size([1, 2, 5])---[num_layers,bs,hidden_size]

输出为:

python 复制代码
tensor([[[-0.1677, -0.0150, -0.0289],
         [-0.1369, -0.0738, -0.0311],
         [ 0.0168, -0.0184,  0.0303]],

        [[-0.0935, -0.0204, -0.1073],
         [-0.0137, -0.0056, -0.0085],
         [-0.0654, -0.0376,  0.0562]]], grad_fn=<TransposeBackward0>)
torch.Size([2, 3, 3])
tensor([[[ 0.0168, -0.0184,  0.0303],
         [-0.0654, -0.0376,  0.0562]]], grad_fn=<StackBackward0>)
torch.Size([1, 2, 3])
tensor([[[ 0.1691, -0.0772,  0.0209,  0.1153,  0.2290],
         [ 0.4153, -0.0653, -0.0928,  0.3042,  0.2146]]],
       grad_fn=<StackBackward0>)
torch.Size([1, 2, 5])

获取 pytorch LSTMP内置参数

python 复制代码
for k, v in layer_LSTMP.named_parameters():
    print(k, v.shape)

其中:

python 复制代码
weight_ih_l0维度为:[4*hidden_size,input_size]
weight_hh_l0维度为:[4*hidden_size,Proj_size]
bias_ih_l0维度为:[4*hidden_size]
bias_hh_l0维度为:[4*hidden_size]
weight_hr_l0维度为:[Proj_size,hidden_size]

1.2 手写一个lstm_forward函数 实现单向LSTM的计算原理

这里先将lstm_forward函数中的每个参数的维度写出来:

python 复制代码
def lstm_forward(input, w_ih, w_hh, b_ih, b_hh, initial_states):
    h0, c0 = initial_states # 初始状态
    bs, T, input_size = input.shape
    hidden_size = w_ih.shape[0]//4

    prev_h = h0
    prev_c = c0
    # input 的维度为:[batch_size, T, input_size]
    # 其中 w_ih和w_hh 的维度为:[4*hidden_size, input_size],[4*hidden_size, hidden_size]
    batch_w_ih = w_ih.unsqueeze(0).tile(bs, 1, 1) # 这样w_ih的变为:[batch_size, 4*hidden_size, input_size]
    batch_w_hh = w_hh.unsqueeze(0).tile(bs, 1, 1) # 这样w_hh的变为:[batch_size, 4*hidden_size, hidden_size]

    output_size = hidden_size
    print(output_size)    # 5
    # 输出序列
    output = torch.zeros(bs, T, output_size)

    for t in range(T):
        # 获取当前时刻的x
        x = input[:, t, :]  #这时候x的维度为:[bs, input_size]
        # 进行矩阵运算,先将x的维度扩为:[bs, input_size, 1]
        # 矩阵运算之后的维度为:[bs, 4*hidden_size, 1]
        w_times_x = torch.bmm(batch_w_ih, x.unsqueeze(-1))
        w_times_x = w_times_x.squeeze(-1)    # 此时维度为:[bs, 4*hidden_size]

        w_times_h_prev = torch.bmm(batch_w_hh, prev_h.unsqueeze(-1))    # [bs, 4*hidden_size,1]
        w_times_h_prev = w_times_h_prev.squeeze(-1) # 维度为:[bs, 4*hidden_size]

        # 计算输入门(i),遗忘门(f),cell门(g),输出门(o)
        i_t = torch.sigmoid(w_times_x[:, :hidden_size] + w_times_h_prev[:, :hidden_size] + b_ih[:hidden_size] + b_hh[:hidden_size])
        f_t = torch.sigmoid(w_times_x[:, hidden_size:2*hidden_size] + w_times_h_prev[:, hidden_size:2*hidden_size] +
                            b_ih[hidden_size:2*hidden_size] + b_hh[hidden_size:2*hidden_size])
        g_t = torch.tanh(w_times_x[:, 2*hidden_size:3*hidden_size] + w_times_h_prev[:, 2*hidden_size:3*hidden_size] +
                            b_ih[2*hidden_size:3*hidden_size] + b_hh[2*hidden_size:3*hidden_size])
        o_t = torch.sigmoid(w_times_x[:, 3*hidden_size:4*hidden_size] + w_times_h_prev[:, 3*hidden_size:4*hidden_size] +
                            b_ih[3*hidden_size:4*hidden_size] + b_hh[3*hidden_size:4*hidden_size])

        prev_c = f_t * prev_c + i_t * g_t
        prev_h = o_t * torch.tanh(prev_c)   # [bs, hidden_size]

        output[:, t, :] = prev_h

    return output, (prev_h, prev_c)
  • 使用pytorch中LSTM的内置参数测试该函数
python 复制代码
c_output, (c_hn, c_cn) = lstm_forward(input, lstm_layer.weight_ih_l0, lstm_layer.weight_hh_l0,
                                      lstm_layer.bias_ih_l0, lstm_layer.bias_hh_l0, (h0, c0))
print("----手写LSTM的输出----")
print(c_output)
print(c_hn)
print(c_cn)

查看两者的输出结果完全相同

1.3 手写一个lstmp_forward函数 实现单向LSTMP的计算原理

python 复制代码
def lstmp_forward(input, initial_states, w_ih, w_hh, b_ih, b_hh, w_hr=None):
    h0, c0 = initial_states
    bs, T, input_size = input.shape
    hidden_size = w_ih.shape[0]//4

    prev_h = h0 # [bs, proj_size]
    prev_c = c0 # [bs, hidden_size]

    # w_ih维度:[4*h_size, input_size] w_hh维度:[4*h_size, proj_size]    w_hr维度:[proj_size, h_size]
    batch_w_ih = w_ih.unsqueeze(0).tile(bs, 1, 1)
    batch_w_hh = w_hh.unsqueeze(0).tile(bs, 1, 1)


    if w_hr is not None:
        proj_size = w_hr.shape[0]
        output_size = proj_size
        batch_w_hr = w_hr.unsqueeze(0).tile(bs, 1, 1)
    else:
        output_size = hidden_size

    output = torch.zeros(bs, T, output_size)

    for t in range(T):
        x = input[:, t, :]  # [bs, input_size]1

        w_times_x = torch.bmm(batch_w_ih, x.unsqueeze(-1)).squeeze(-1) # [bs, 4*input_size]
        w_times_prev_h = torch.bmm(batch_w_hh, prev_h.unsqueeze(-1)).squeeze(-1)    #[ba, 4*hidden_size]

        # 分明计算输入门(i),遗忘门(f),cell门(c), 输出门(o)
        i_t = torch.sigmoid(w_times_x[:, :hidden_size] + w_times_prev_h[:, :hidden_size] + b_ih[:hidden_size] + b_hh[:hidden_size])
        f_t = torch.sigmoid(w_times_x[:, hidden_size:2*hidden_size] + w_times_prev_h[:, hidden_size:2*hidden_size]
                            + b_ih[hidden_size:2*hidden_size] + b_hh[hidden_size:2*hidden_size])
        g_t = torch.tanh(w_times_x[:, 2*hidden_size:3*hidden_size] + w_times_prev_h[:, 2*hidden_size:3*hidden_size]
                            + b_ih[2*hidden_size:3*hidden_size] + b_hh[2*hidden_size:3*hidden_size])
        o_t = torch.sigmoid(w_times_x[:, 3*hidden_size:4*hidden_size] + w_times_prev_h[:, 3*hidden_size:4*hidden_size]
                            + b_ih[3*hidden_size:4*hidden_size] + b_hh[3*hidden_size:4*hidden_size])

        prev_c = f_t * prev_c + i_t * g_t
        prev_h = o_t *torch.tanh(prev_c) #[bs, hidden_size]

        if w_hr is not None:
            #  w_hr维度:[proj_size, h_size]
            prev_h = torch.bmm(batch_w_hr, prev_h.unsqueeze(-1))      # [bs, proj_size, 1]
            prev_h = prev_h.squeeze(-1) # [bs, proj_size]

        output[:, t, :] = prev_h

    return output, (prev_h, prev_c)
  • 使用pytorch中LSTM的内置参数测试该函数
python 复制代码
c_output, (c_h_n, c_c_n) = lstmp_forward(input, (h0, c0), layer_LSTMP.weight_ih_l0, layer_LSTMP.weight_hh_l0,
                                         layer_LSTMP.bias_ih_l0, layer_LSTMP.bias_hh_l0, layer_LSTMP.weight_hr_l0)
print("---手写lstmp_forward函数输出---")
print(c_output)

查看两者的输出结果完全相同

总结

在本次学习中,通过对LSTM运算过程的代码逐行实现,了解到LSTM模型,以及加深了自己对LSTM模型的理解与推导。在上次的学习中,我学习到RNN不能处理长依赖问题,而LSTM是一种特殊的RNN,比较适用于解决长依赖问题,而且LSTM与RNN一样是链式结构,但是结构不相同,有4个input,分别指外界存储到Memory里面的值和3个gate(Input Gate、Forget Gate和Output Gate)的信号,它们都是以比较简单的方式起作用。

相关推荐
智算菩萨1 小时前
【Generative AI For Autonomous Driving】1 生成式AI重塑自动驾驶的技术浪潮与体系化挑战
论文阅读·人工智能·深度学习·机器学习·ai·自动驾驶
智算菩萨1 小时前
【Generative AI For Autonomous Driving】7 生成式AI驱动自动驾驶的未来图景:开放挑战、社会机遇与技术展望
论文阅读·人工智能·深度学习·机器学习·ai·自动驾驶
B站_计算机毕业设计之家1 小时前
计算机毕业设计:Python当当网图书数据全链路处理平台 Django框架 爬虫 Pandas 可视化 大数据 大模型 书籍(建议收藏)✅
爬虫·python·机器学习·django·flask·pandas·课程设计
散峰而望1 小时前
【基础算法】从入门到实战:递归型枚举与回溯剪枝,暴力搜索的初级优化指南
数据结构·c++·后端·算法·机器学习·github·剪枝
q_35488851534 小时前
计算机毕业设计:Python当当网图书大数据分析平台 Django框架 爬虫 Pandas 可视化 大数据 大模型 书籍(建议收藏)✅
大数据·爬虫·python·机器学习·数据分析·django·课程设计
云和数据.ChenGuang5 小时前
鸿蒙餐饮系统:全场景智慧餐饮新范式
人工智能·机器学习·华为·数据挖掘·harmonyos·鸿蒙·鸿蒙系统
Lab_AI5 小时前
科学智能AI4S应用:人工智能加速加速抗生素发现(AIDD助力药物研发)
人工智能·神经网络·机器学习·ai4s·药物研发·aidd
yeflx7 小时前
三维空间坐标转换早期笔记
人工智能·算法·机器学习
GIS数据转换器7 小时前
洪水时空大数据分析与评估系统
大数据·人工智能·机器学习·数据挖掘·数据分析·无人机·宠物
放下华子我只抽RuiKe58 小时前
从零开源:如何将自定义 AI Skill 发布到 GitHub
人工智能·机器学习·开源·github·集成学习·skills·openclaw