快速上手大模型：机器学习3（多元线性回归及梯度、向量化、正规方程）

目录

[1 多元线性回归（Multiple liner regression）](#1 多元线性回归（Multiple liner regression）)

[1.1 定义](#1.1 定义)

[2 向量化](#2 向量化)

[2.1 例子](#2.1 例子)

[2.2 代码实现](#2.2 代码实现)

[2.3 NumPy规则](#2.3 NumPy规则)

[2.4 向量点积运算（dot product multiplies the values）](#2.4 向量点积运算（dot product multiplies the values）)

[2.5 矩阵创建](#2.5 矩阵创建)

[2.6 矩阵索引](#2.6 矩阵索引)

[3 多元线性回归的梯度下降法](#3 多元线性回归的梯度下降法)

[3.1 定义](#3.1 定义)

[3.2 梯度函数](#3.2 梯度函数)

[4 正规方程(Normal equation)求解代价函数](#4 正规方程(Normal equation)求解代价函数)

[4.1 定义](#4.1 定义)

[4.2 缺点](#4.2 缺点)

学习目的：

让线性回归更快、更强大。

1 多元线性回归（Multiple liner regression）

1.1 定义

：第j个特征；

n：特征总数；

：第i个训练样本的所有特征；

：第j个特征第i个训练样本。

多元线性回归 （Multiple liner regression）：多个输入特征的线性回归模型，表达式。

2 向量化

2.1 例子

向量化使代码简洁、运行高效。

2.2 代码实现

使用Python编译，调用NumPy包
（1）引入模块/包

复制代码

import numpy as np    # it is an unofficial standard to use np for numpy
import time

（2）数组创建

代码：

复制代码

# NumPy routines which allocate memory and fill arrays with value
a = np.zeros(4);                print(f"np.zeros(4) :   a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.zeros((4,));             print(f"np.zeros(4,) :  a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.random.random_sample(4); print(f"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

输出：

复制代码

# NumPy routines which allocate memory and fill arrays with value
a = np.zeros(4);                print(f"np.zeros(4) :   a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.zeros((4,));             print(f"np.zeros(4,) :  a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.random.random_sample(4); print(f"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

np.zeros(4)：创建一个长度为4的数组，所有元素都是 0.0（浮点数）；

a.shape：数组形状，这里是 (4,)，表示一维数组长度 4；

a.dtype：数据类型。

2.3 NumPy规则

（1）索引（Indexing）

负的从末尾去找，超限报错。

例子：

复制代码

#vector indexing operations on 1-D vectors
a = np.arange(10)
print(a)

#access an element
print(f"a[2].shape: {a[2].shape} a[2]  = {a[2]}, Accessing an element returns a scalar")

# access the last element, negative indexes count from the end
print(f"a[-1] = {a[-1]}")

#indexs must be within the range of the vector or they will produce and error
try:
    c = a[10]
except Exception as e:
    print("The error message you'll see is:")
    print(e)

输出：

复制代码

[0 1 2 3 4 5 6 7 8 9]
a[2].shape: () a[2]  = 2, Accessing an element returns a scalar
a[-1] = 9
The error message you'll see is:
index 10 is out of bounds for axis 0 with size 10

（2）切片（Slicing）

复制代码

a[start : stop : step]

start：起始索引，包含起始；

stop：终止索引，不包含终止位；

step：步长，每隔多少取一个。

例子：

复制代码

#vector slicing operations
a = np.arange(10)
print(f"a         = {a}")

# access all elements index 3 and above
c = a[3:];        print("a[3:]    = ", c)

# access all elements below index 3
c = a[:3];        print("a[:3]    = ", c)

# access all elements
c = a[:];         print("a[:]     = ", c)

输出：

复制代码

a[3:]    =  [3 4 5 6 7 8 9]
a[:3]    =  [0 1 2]
a[:]     =  [0 1 2 3 4 5 6 7 8 9]

2.4 向量点积运算（dot product multiplies the values）

代码：

复制代码

import numpy as np

# test 1-D
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])
c = np.dot(a, b)
print(f"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} ") 
c = np.dot(b, a)
print(f"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} ")

输出：

复制代码

# np.dot(a, b).shape = () 此处输出为空，因为np.dot(a, b)是数值，不存在维度说法
NumPy 1-D np.dot(a, b) = 24, np.dot(a, b).shape = () 
NumPy 1-D np.dot(b, a) = 24, np.dot(a, b).shape = ()

2.5 矩阵创建

代码：

复制代码

a = np.zeros((1, 5))                                       
print(f"a shape = {a.shape}, a = {a}")                     

a = np.zeros((2, 1))                                                                   
print(f"a shape = {a.shape}, a = {a}") 

a = np.random.random_sample((1, 1))  
print(f"a shape = {a.shape}, a = {a}")

输出：

复制代码

a = np.zeros((1, 5))                                       
print(f"a shape = {a.shape}, a = {a}")                     

a = np.zeros((2, 1))                                                                   
print(f"a shape = {a.shape}, a = {a}") 

a = np.random.random_sample((1, 1))  
print(f"a shape = {a.shape}, a = {a}")

2.6 矩阵索引

代码：

复制代码

#vector indexing operations on matrices
"""np.arange(6) → 生成一个一维数组 [0, 1, 2, 3, 4, 5]
   .reshape(-1, 2) → 重新调整形状为 N×2 的矩阵
   这里的 -1 表示 自动计算行数，使总元素数量不变。"""
a = np.arange(6).reshape(-1, 2)   #reshape is a convenient way to create matrices
print(f"a.shape: {a.shape}, \na= {a}")

#access an element
print(f"\na[2,0].shape:   {a[2, 0].shape}, a[2,0] = {a[2, 0]},     type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\n")

#access a row
print(f"a[2].shape:   {a[2].shape}, a[2]   = {a[2]}, type(a[2])   = {type(a[2])}")

输出：

复制代码

a.shape: (3, 2), 
a= [[0 1]
 [2 3]
 [4 5]]

a[2,0].shape:   (), a[2,0] = 4,     type(a[2,0]) = <class 'numpy.int64'> Accessing an element returns a scalar

a[2].shape:   (2,), a[2]   = [4 5], type(a[2])   = <class 'numpy.ndarray'>

3 多元线性回归的梯度下降法

一元的详见3.5：

https://blog.csdn.net/weixin_45728280/article/details/153348420?spm=1001.2014.3001.5501

3.1 定义

参数(Parameters)：，b

模型(Model)：

代价函数(Cost function)：

梯度下降：

代入后梯度函数：

3.2 梯度函数

复制代码

def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs batch gradient descent to learn theta. Updates theta by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
      """
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db,dj_dw = gradient_function(X, y, w, b)   ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               ##None
        b = b - alpha * dj_db               ##None
      
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
        
    return w, b, J_history #return final w,b and J history for graphing






# initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.
# some gradient descent settings
iterations = 1000
alpha = 5.0e-7
# run gradient descent 
w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,
                                                    compute_cost, compute_gradient, 
                                                    alpha, iterations)
print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
m,_ = X_train.shape
for i in range(m):
    print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")

输出：

复制代码

Iteration    0: Cost  2529.46   
Iteration  100: Cost   695.99   
Iteration  200: Cost   694.92   
Iteration  300: Cost   693.86   
Iteration  400: Cost   692.81   
Iteration  500: Cost   691.77   
Iteration  600: Cost   690.73   
Iteration  700: Cost   689.71   
Iteration  800: Cost   688.70   
Iteration  900: Cost   687.69   
b,w found by gradient descent: -0.00,[ 0.2   0.   -0.01 -0.07] 
prediction: 426.19, target value: 460
prediction: 286.17, target value: 232
prediction: 171.47, target value: 178

4 正规方程(Normal equation)求解代价函数

采用正规方程可以直接求解w,b，无需使用梯度下降法迭代。

4.1 定义

仅用于线性回归，用于直接求解w,b而不需迭代。

4.2 缺点

（1）无法推广到其他算法；

（2）特征量n很大（>10000）运算很慢