RNN 循环神经网络计算过程（通俗+公式版+运行实例）

RNN 循环神经网络计算过程（通俗+公式版）

RNN 的核心是对序列数据按时间步依次计算，每个时刻都复用同一套权重，并把上一时刻的状态传给下一时刻。

1. 符号定义

-xt\Large x_txt ：第t\Large tt 时刻输入

-ht−1\Large h_{t-1}ht−1 ：上一时刻隐藏状态

-ht\Large h_tht ：当前时刻隐藏状态

-Wxh\Large W_{xh}Wxh ：输入到隐藏层权重

-Whh\Large W_{hh}Whh ：隐藏层自循环权重

-bh\Large b_hbh ：隐藏层偏置

-Why\Large W_{hy}Why ：隐藏到输出权重

-by\Large b_yby ：输出偏置

-tanh⁡\Large \tanhtanh ：激活函数

2. 单步计算流程

对每个时间步t\Large tt ：

① 计算隐藏状态

ht=tanh⁡(Wxhxt+Whhht−1+bh) \Large h_t = \tanh\big(W_{xh} x_t + W_{hh} h_{t-1} + b_h\big) ht=tanh(Wxhxt+Whhht−1+bh)

初始状态：h0=0\Large h_0 = \mathbf{0}h0=0 （全零向量）

② 计算输出

yt=Whyht+by \Large y_t = W_{hy} h_t + b_y yt=Whyht+by

（分类任务后接 softmax，回归任务直接输出）

3. 完整前向传播（序列展开）

给定序列x1,x2,...,xT\Large x_1,x_2,\dots,x_Tx1,x2,...,xT ：

1.t=1\Large t=1t=1
h1=tanh⁡(Wxhx1+Whhh0+bh),y1=Whyh1+by \Large h_1 = \tanh(W_{xh}x_1 + W_{hh}h_0 + b_h),\quad y_1 = W_{hy}h_1 + b_y h1=tanh(Wxhx1+Whhh0+bh),y1=Whyh1+by

2.t=2\Large t=2t=2
h2=tanh⁡(Wxhx2+Whhh1+bh),y2=Whyh2+by \Large h_2 = \tanh(W_{xh}x_2 + W_{hh}h_1 + b_h),\quad y_2 = W_{hy}h_2 + b_y h2=tanh(Wxhx2+Whhh1+bh),y2=Whyh2+by

一直重复到t=T\Large t=Tt=T
hT=tanh⁡(WxhxT+WhhhT−1+bh),yT=WhyhT+by\Large h_T = \tanh(W_{xh}x_T + W_{hh}h_{T-1} + b_h),\quad y_T = W_{hy}h_T + b_yhT=tanh(WxhxT+WhhhT−1+bh),yT=WhyhT+by

关键：所有权重Wxh,Whh,Why\Large W_{xh},W_{hh},W_{hy}Wxh,Whh,Why 在所有时间步共享。

4. 反向传播 BPTT 要点

损失：L=∑t=1TLt(yt,y^t)\Large L = \sum_{t=1}^T L_t(y_t, \hat{y}_t)L=∑t=1TLt(yt,y^t)
梯度沿时间反向传播，对Wxh,Whh,Why\Large W_{xh},W_{hh},W_{hy}Wxh,Whh,Why 求导并累加
容易出现梯度消失/爆炸，所以后来才有 LSTM、GRU

5. 极简总结

RNN 就是：
输入 + 上一时刻状态 → 激活 → 当前状态 → 输出

一步一步沿着序列走，权重全程不变。

好，我们直接来一个标准矩阵版 RNN 手算例子，维度小、步骤全，能看清每一步矩阵运算。

1. 设定参数

输入维度：dx=2d_x = 2dx=2
隐藏层维度：dh=2d_h = 2dh=2
输出维度：dy=1d_y = 1dy=1
序列长度 T=2T=2T=2，输入：
x1= $10$ ,x2= $01$ \Large x_1=\begin{bmatrix}1\\0\end{bmatrix},\quad x_2=\begin{bmatrix}0\\1\end{bmatrix} x1= 10 ,x2= 01
初始隐藏状态：
h0= $00$ h_0 = \begin{bmatrix}0\\0\end{bmatrix} h0= $00$

权重矩阵（手动给定）

Wxh= $0.10.20.30.4$ ,Whh= $0.50.60.70.8$ ,bh= $0.10.2$ \Large W_{xh} = \begin{bmatrix} 0.1 & 0.2\\ 0.3 & 0.4 \end{bmatrix},\quad W_{hh} = \begin{bmatrix} 0.5 & 0.6\\ 0.7 & 0.8 \end{bmatrix},\quad b_h = \begin{bmatrix}0.1\\0.2\end{bmatrix} Wxh= 0.10.30.20.4 ,Whh= 0.50.70.60.8 ,bh= 0.10.2

Why= $0.91.0$ ,by=0.1 \Large W_{hy} = \begin{bmatrix}0.9 & 1.0\end{bmatrix},\quad b_y = 0.1 Why= $0.91.0$ ,by=0.1

激活函数：tanh⁡\tanhtanh

2. 计算公式

ht=tanh⁡(Wxhxt+Whhht−1+bh) \Large h_t = \tanh\left(W_{xh}x_t + W_{hh}h_{t-1} + b_h\right) ht=tanh(Wxhxt+Whhht−1+bh)

yt=Whyht+by \Large y_t = W_{hy}h_t + b_y yt=Whyht+by

3. 计算 t=1

线性部分
Wxhx1= $0.10.20.30.4$ $10$ = $0.10.3$ \Large W_{xh}x_1 = \begin{bmatrix}0.1&0.2\\0.3&0.4\end{bmatrix}\begin{bmatrix}1\\0\end{bmatrix} = \begin{bmatrix}0.1\\0.3\end{bmatrix} Wxhx1= 0.10.30.20.4 10 = 0.10.3

Whhh0=0 \Large W_{hh}h_0 = 0 Whhh0=0

z1=Wxhx1+Whhh0+bh= $0.10.3$ + $0.10.2$ = $0.20.5$ \Large z_1 = W_{xh}x_1 + W_{hh}h_0 + b_h = \begin{bmatrix}0.1\\0.3\end{bmatrix} + \begin{bmatrix}0.1\\0.2\end{bmatrix} = \begin{bmatrix}0.2\\0.5\end{bmatrix} z1=Wxhx1+Whhh0+bh= 0.10.3 + 0.10.2 = 0.20.5

激活
h1=tanh⁡ $0.20.5$ ≈ $0.1970.462$ \Large h_1 = \tanh\begin{bmatrix}0.2\\0.5\end{bmatrix} \approx \begin{bmatrix}0.197\\0.462\end{bmatrix} h1=tanh 0.20.5 ≈ 0.1970.462
输出
y1= $0.91.0$ $0.1970.462$ +0.1≈0.177+0.462+0.1=0.739 \Large y_1 = \begin{bmatrix}0.9&1.0\end{bmatrix}\begin{bmatrix}0.197\\0.462\end{bmatrix} + 0.1 \approx 0.177 + 0.462 + 0.1 = 0.739 y1= $0.91.0$ 0.1970.462 +0.1≈0.177+0.462+0.1=0.739

4. 计算 t=2

线性部分
Wxhx2= $0.10.20.30.4$ $01$ = $0.20.4$ \Large W_{xh}x_2 = \begin{bmatrix}0.1&0.2\\0.3&0.4\end{bmatrix}\begin{bmatrix}0\\1\end{bmatrix} = \begin{bmatrix}0.2\\0.4\end{bmatrix} Wxhx2= 0.10.30.20.4 01 = 0.20.4

Whhh1= $0.50.60.70.8$ $0.1970.462$ ≈ $0.3760.507$ \Large W_{hh}h_1 =\begin{bmatrix}0.5&0.6\\0.7&0.8\end{bmatrix}\begin{bmatrix}0.197\\0.462\end{bmatrix} \approx\begin{bmatrix}0.376\\0.507\end{bmatrix} Whhh1= 0.50.70.60.8 0.1970.462 ≈ 0.3760.507

z2= $0.20.4$ + $0.3760.507$ + $0.10.2$ = $0.6761.107$ \Large z_2 = \begin{bmatrix}0.2\\0.4\end{bmatrix} +\begin{bmatrix}0.376\\0.507\end{bmatrix} +\begin{bmatrix}0.1\\0.2\end{bmatrix} = \begin{bmatrix}0.676\\1.107\end{bmatrix} z2= 0.20.4 + 0.3760.507 + 0.10.2 = 0.6761.107

激活
h2=tanh⁡ $0.6761.107$ ≈ $0.5880.803$ \Large h_2 = \tanh\begin{bmatrix}0.676\\1.107\end{bmatrix} \approx\begin{bmatrix}0.588\\0.803\end{bmatrix} h2=tanh 0.6761.107 ≈ 0.5880.803
输出
y2= $0.91.0$ $0.5880.803$ +0.1≈0.529+0.803+0.1=1.432 \Large y_2 = \begin{bmatrix}0.9&1.0\end{bmatrix}\begin{bmatrix}0.588\\0.803\end{bmatrix}+0.1 \approx 0.529+0.803+0.1 =1.432 y2= $0.91.0$ 0.5880.803 +0.1≈0.529+0.803+0.1=1.432

5. 最终结果

h1≈ $0.1970.462$ , y1≈0.739 \Large h_1\approx\begin{bmatrix}0.197\\0.462\end{bmatrix},\ y_1\approx0.739 h1≈ 0.1970.462 , y1≈0.739

h2≈ $0.5880.803$ , y2≈1.432 \Large h_2\approx\begin{bmatrix}0.588\\0.803\end{bmatrix},\ y_2\approx1.432 h2≈ 0.5880.803 , y2≈1.432

6. 矩阵版 RNN 核心要点

每一步都是矩阵乘法 + 向量加法
Wxh,Whh,WhyW_{xh},W_{hh},W_{hy}Wxh,Whh,Why 在所有时间步完全共享
隐藏状态 hth_tht 是向量，负责携带历史信息
计算是串行的：必须先算 h1h_1h1 才能算 h2h_2h2

RNN 循环神经网络 计算过程（通俗+公式版+运行实例）