深度学习理论推导--多元线性回归

文章目录

- 前言
- 一元线性回归
- 多元线性回归
- 矩阵形式
- [残差平方和 RSS](#残差平方和 RSS)
- 最小残差平方和
- - 计算准备
  - 链式求导
  - [矩阵求解参数 W](#矩阵求解参数 W)
- 验证
- - 一元函数验证
- [python 程序验证](#python 程序验证)
- - 结果展示

当你迷茫的时候，请回头看看目录大纲，也许有你意想不到的收获

前言

先准备一些矩阵小知识,后面会大有用处,下面 A,B,C,W 全是矩阵表示

右乘提取公共项, A , B , C ∈ R a × b , W ∈ R b × c A,B,C \in R^{a\times b},W\in R^{b\times c} A,B,C∈Ra×b,W∈Rb×c:

A ⋅ W B ⋅ W C ⋅ W \] = \[ A B C \] W \\begin{bmatrix} A\\cdot W\\\\ B\\cdot W\\\\ C\\cdot W\\\\ \\end{bmatrix}=\\begin{bmatrix} A\\\\ B\\\\ C\\\\ \\end{bmatrix}W A⋅WB⋅WC⋅W = ABC W 2. 左乘提取公共项, W ∈ R c × b , A , B , C ∈ R a × b W\\in R\^{c\\times b}, A,B,C \\in R\^{a\\times b} W∈Rc×b,A,B,C∈Ra×b: \[ W ⋅ A W ⋅ B W ⋅ C \] = W \[ A B C \] \\begin{bmatrix} W\\cdot A\& W\\cdot B\& W\\cdot C\& \\end{bmatrix} =W\\begin{bmatrix} A\& B\& C\& \\end{bmatrix} \[W⋅AW⋅BW⋅C\]=W\[ABC

如果是其他形式，可以先适当变换，变成上面两种形式后再提取公共项，当然，还有些其他形式的，看上去像，但不一定能直接提，例如:

运用我们上面多元线性回归的结论:

W = ( X T X ) − 1 X T Y W=(X^TX)^{-1}X^TY W=(XTX)−1XTY

X T X = [ x 1 x 2 . . . x m 1 1 . . . 1 ] [ x 1 1 x 2 1 ... x m 1 ] = [ ∑ i = 1 m x i 2 ∑ i = 1 m x i ∑ i = 1 m x i m ] X T Y = [ x 1 x 2 . . . x m 1 1 . . . 1 ] [ y 1 y 2 ... y m ] = [ ∑ i = 1 m x i y i ∑ i = 1 m y i ] X^TX=\begin{bmatrix} x_{1} & x_{2} &...&x_{m} \\ 1 & 1&...&1\\ \end{bmatrix} \begin{bmatrix} x_{1} & 1\\ x_{2} & 1\\ \dots \\ x_{m} & 1\\ \end{bmatrix} =\begin{bmatrix} \sum\limits_{i=1}^{m}{x_{i}^2} & \sum\limits_{i=1}^{m}{x_{i}}\\ \sum\limits_{i=1}^{m}{x_{i}} & m\\ \end{bmatrix}\\[10pt] % Y X^TY=\begin{bmatrix} x_{1} & x_{2} &...&x_{m} \\ 1 & 1&...&1\\ \end{bmatrix}\begin{bmatrix} y_1\\ y_2\\ \dots \\ y_m\\ \end{bmatrix} =\begin{bmatrix} \sum\limits_{i=1}^{m}{x_i y_i}\\[10pt] \sum\limits_{i=1}^{m}y_i \end{bmatrix} XTX=[x11x21......xm1] x1x2...xm111 = i=1∑mxi2i=1∑mxii=1∑mxim XTY=[x11x21......xm1] y1y2...ym = i=1∑mxiyii=1∑myi

所以就得出
W = ( X T X ) − 1 X T Y = [ ∑ i = 1 m x i 2 ∑ i = 1 m x i ∑ i = 1 m x i m ] − 1 [ ∑ i = 1 m x i y i ∑ i = 1 m y i ] W=(X^TX)^{-1}X^TY= \begin{bmatrix} \sum\limits_{i=1}^{m}{x_{i}^2} & \sum\limits_{i=1}^{m}{x_{i}}\\ \sum\limits_{i=1}^{m}{x_{i}} & m\\ \end{bmatrix}^{-1} \begin{bmatrix} \sum\limits_{i=1}^{m}{x_i y_i}\\[10pt] \sum\limits_{i=1}^{m}y_i \end{bmatrix} W=(XTX)−1XTY= i=1∑mxi2i=1∑mxii=1∑mxim −1 i=1∑mxiyii=1∑myi

结果与开头一元线性回归的解相呼应了, 确实是一样的

w b \] = \[ ∑ i = 1 m x i 2 ∑ i = 1 m x i ∑ i = 1 m x i m \] − 1 \[ ∑ i = 1 m x i y i ∑ i = 1 m y i \] \\begin{bmatrix} w\\\\\[10pt\] b \\end{bmatrix} =\\begin{bmatrix} \\sum\\limits_{i=1}\^{m}{x_i}\^2 \& \\sum\\limits_{i=1}\^{m}{x_i} \\\\\[10pt\] \\sum\\limits_{i=1}\^{m}{x_i} \& m \\end{bmatrix}\^{-1} \\begin{bmatrix} \\sum\\limits_{i=1}\^{m}{x_i y_i}\\\\\[10pt\] \\sum\\limits_{i=1}\^{m}y_i \\end{bmatrix} wb = i=1∑mxi2i=1∑mxii=1∑mxim −1 i=1∑mxiyii=1∑myi ### python 程序验证 ```python import numpy as np import matplotlib.pyplot as plt # 设置中文字体 - macOS plt.rcParams['font.sans-serif'] = ['Arial Unicode MS', 'Heiti TC', 'STHeiti'] plt.rcParams['axes.unicode_minus'] = False # 1. 生成模拟数据 np.random.seed(42) # 设置随机种子保证可重复性 # 真实参数 true_w1 = 2.5 true_w2 = -1.8 true_bias = 3.0 # 生成样本数据 n_samples = 100 x1 = np.random.uniform(-5, 5, n_samples) x2 = np.random.uniform(-5, 5, n_samples) # 添加噪声的真实y值 noise = np.random.normal(0, 2, n_samples) y = true_w1 * x1 + true_w2 * x2 + true_bias + noise print(f"真实参数: w1={true_w1:.2f}, w2={true_w2:.2f}, bias={true_bias:.2f}") # 2. 计算RSS函数 def compute_rss(w1, w2, bias, x1, x2, y): """计算残差平方和""" y_pred = w1 * x1 + w2 * x2 + bias rss = np.sum((y - y_pred) ** 2) return rss # 3. 创建参数网格 w1_range = np.linspace(0, 5, 50) # w1参数范围 w2_range = np.linspace(-4, 1, 50) # w2参数范围 W1, W2 = np.meshgrid(w1_range, w2_range) # 固定偏置项为真实值（或者可以也作为变量） fixed_bias = true_bias # 计算每个参数组合的RSS RSS = np.zeros_like(W1) for i in range(W1.shape[0]): for j in range(W1.shape[1]): RSS[i, j] = compute_rss(W1[i, j], W2[i, j], fixed_bias, x1, x2, y) # 4. 绘制RSS曲面图 fig = plt.figure(figsize=(16, 6)) # 子图1：3D曲面图 ax1 = fig.add_subplot(121, projection='3d') surf = ax1.plot_surface(W1, W2, RSS, cmap='viridis', alpha=0.8, linewidth=0, antialiased=True) # 标记最小值点 min_idx = np.unravel_index(np.argmin(RSS), RSS.shape) min_w1 = W1[min_idx] min_w2 = W2[min_idx] min_rss = RSS[min_idx] ax1.scatter([min_w1], [min_w2], [min_rss], color='red', s=100, label=f'最小RSS点\nw1={min_w1:.2f}\nw2={min_w2:.2f}') ax1.set_xlabel('参数 w1', labelpad=10) ax1.set_ylabel('参数 w2', labelpad=10) ax1.set_zlabel('RSS (残差平方和)', labelpad=10) ax1.set_title('RSS 与参数关系曲面图\n(3D视图)', pad=20) ax1.legend() # 子图2：等高线图 ax2 = fig.add_subplot(122) contour = ax2.contour(W1, W2, RSS, levels=20, cmap='viridis') ax2.clabel(contour, inline=True, fontsize=8) # 标记真实参数点 ax2.scatter(true_w1, true_w2, color='blue', s=100, marker='*', label=f'真实参数\nw1={true_w1:.2f}\nw2={true_w2:.2f}') # 标记估计参数点 ax2.scatter(min_w1, min_w2, color='red', s=100, marker='o', label=f'估计参数\nw1={min_w1:.2f}\nw2={min_w2:.2f}') ax2.set_xlabel('参数 w1') ax2.set_ylabel('参数 w2') ax2.set_title('RSS 等高线图', pad=20) ax2.legend() ax2.grid(True, alpha=0.3) plt.tight_layout() plt.show() # 5. 使用正规方程验证结果 print("\n=== 正规方程验证 ===") # 添加偏置项 X_matrix = np.column_stack([np.ones(n_samples), x1, x2]) # 正规方程: w = (X^T X)^(-1) X^T y w_optimal = np.linalg.inv(X_matrix.T @ X_matrix) @ X_matrix.T @ y print(f"正规方程解: bias={w_optimal[0]:.4f}, w1={w_optimal[1]:.4f}, w2={w_optimal[2]:.4f}") print(f"真实参数: bias={true_bias:.4f}, w1={true_w1:.4f}, w2={true_w2:.4f}") # 6. 绘制数据点和回归平面 fig2 = plt.figure(figsize=(12, 5)) # 子图1：数据点散点图 ax3 = fig2.add_subplot(121, projection='3d') # 绘制数据点 scatter = ax3.scatter(x1, x2, y, c=y, cmap='viridis', s=30, alpha=0.7) # 创建回归平面 x1_plane = np.linspace(-5, 5, 20) x2_plane = np.linspace(-5, 5, 20) X1_plane, X2_plane = np.meshgrid(x1_plane, x2_plane) Y_plane = w_optimal[1] * X1_plane + w_optimal[2] * X2_plane + w_optimal[0] # 绘制回归平面 ax3.plot_surface(X1_plane, X2_plane, Y_plane, alpha=0.3, color='red') ax3.set_xlabel('特征 x1') ax3.set_ylabel('特征 x2') ax3.set_zlabel('目标值 y') ax3.set_title('数据点与回归平面', pad=20) # 子图2：预测值与真实值对比 ax4 = fig2.add_subplot(122) y_pred = X_matrix @ w_optimal ax4.scatter(y, y_pred, alpha=0.7) ax4.plot([y.min(), y.max()], [y.min(), y.max()], 'r--', lw=2) ax4.set_xlabel('真实值') ax4.set_ylabel('预测值') ax4.set_title('预测值与真实值对比', pad=20) ax4.grid(True, alpha=0.3) # 计算R² r_squared = 1 - np.sum((y - y_pred) ** 2) / np.sum((y - np.mean(y)) ** 2) ax4.text(0.05, 0.95, f'R² = {r_squared:.4f}', transform=ax4.transAxes, fontsize=12, verticalalignment='top', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5)) plt.tight_layout() plt.show() ``` #### 结果展示 真实参数: w1=2.50, w2=-1.80, bias=3.00 ![](https://i-blog.csdnimg.cn/img_convert/5cd8bdd906f822b2ab24565cc5e13f07.png) === 正规方程验证 === 正规方程解: bias=3.1988, w1=2.4317, w2=-1.6561 真实参数: bias=3.0000, w1=2.5000, w2=-1.8000 ![](https://i-blog.csdnimg.cn/img_convert/9a099c2c841814e4a0930e3c6e58493b.png)