一、逻辑回归基础理论
1.1 逻辑回归模型概述
逻辑回归是一种广泛应用于分类问题的统计学习方法,尤其在二分类问题中表现突出。其核心思想是利用逻辑函数(也称为Sigmoid函数)将线性回归的输出映射到(0,1)区间,从而将连续值转化为概率形式,适用于预测样本属于某一类别的概率。
数学上,逻辑回归模型可以表示为:
P(y=1∣x)=σ(wTx+b)=11+e−(wTx+b) P(y=1|x) = \sigma(\mathbf{w}^T\mathbf{x} + b) = \frac{1}{1 + e^{-(\mathbf{w}^T\mathbf{x} + b)}} P(y=1∣x)=σ(wTx+b)=1+e−(wTx+b)1
其中,w\mathbf{w}w 是权重向量,bbb 是偏置项,σ\sigmaσ 是Sigmoid函数。
1.2 损失函数与优化目标
逻辑回归的目标是最大化似然函数,等价于最小化负对数似然损失(Log Loss)。对于给定的数据集 {(xi,yi)}\{(\mathbf{x}_i, y_i)\}{(xi,yi)},损失函数定义为:
L(w,b)=−1N∑i=1N[yilog(y^i)+(1−yi)log(1−y^i)] L(\mathbf{w}, b) = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right] L(w,b)=−N1i=1∑N[yilog(y^i)+(1−yi)log(1−y^i)]
其中,y^i=σ(wTxi+b)\hat{y}_i = \sigma(\mathbf{w}^T\mathbf{x}_i + b)y^i=σ(wTxi+b) 是预测概率,NNN 是样本数量。优化目标是通过调整 w\mathbf{w}w 和 bbb 使得损失函数最小化。
1.3 梯度下降法
梯度下降法是优化逻辑回归模型的常用方法。通过计算损失函数关于参数的梯度,并沿着梯度的反方向更新参数,逐步逼近最优解。参数更新公式为:
w:=w−α∂L∂wb:=b−α∂L∂b \mathbf{w} := \mathbf{w} - \alpha \frac{\partial L}{\partial \mathbf{w}} \\ b := b - \alpha \frac{\partial L}{\partial b} w:=w−α∂w∂Lb:=b−α∂b∂L
其中,α\alphaα 是学习率,控制每次更新的步长。
二、Python实现逻辑回归
2.1 数据准备与预处理
在实现逻辑回归之前,首先需要准备和预处理数据。以著名的鸢尾花(Iris)数据集为例,选择其中两类(如Setosa和Versicolor)进行二分类任务。
python
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# 加载数据
iris = datasets.load_iris()
X = iris.data[iris.target != 2] # 选择前两类
y = iris.target[iris.target != 2]
# 分割训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 特征标准化
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
2.2 实现Sigmoid函数与损失函数
定义Sigmoid激活函数和损失函数,以便后续计算预测概率和梯度。
python
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def log_loss(y_true, y_pred):
epsilon = 1e-15 # 防止取对数时出现log(0)
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
2.3 实现逻辑回归模型
接下来,构建逻辑回归类,包含初始化、预测和训练方法。
python
class LogisticRegressionCustom:
def __init__(self, learning_rate=0.01, num_iterations=1000):
self.learning_rate = learning_rate
self.num_iterations = num_iterations
self.weights = None
self.bias = None
def initialize_parameters(self, n_features):
self.weights = np.zeros(n_features)
self.bias = 0.0
def predict_proba(self, X):
z = np.dot(X, self.weights) + self.bias
return sigmoid(z)
def predict(self, X):
proba = self.predict_proba(X)
return (proba >= 0.5).astype(int)
def train(self, X, y):
n_samples, n_features = X.shape
self.initialize_parameters(n_features)
for i in range(self.num_iterations):
# 预测
y_pred = self.predict_proba(X)
# 计算梯度
dw = np.dot(X.T, (y_pred - y)) / n_samples
db = np.sum(y_pred - y) / n_samples
# 更新参数
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db
# 打印损失
if (i+1) % 100 == 0:
loss = log_loss(y, y_pred)
print(f"Iteration {i+1}/{self.num_iterations} - Loss: {loss:.4f}")
2.4 训练与评估模型
使用上述实现的逻辑回归模型进行训练,并在测试集上评估性能。
python
# 初始化模型
model = LogisticRegressionCustom(learning_rate=0.1, num_iterations=1000)
# 训练模型
model.train(X_train, y_train)
# 预测与评估
y_pred = model.predict(X_test)
accuracy = np.mean(y_pred == y_test)
print(f"Test Accuracy: {accuracy * 100:.2f}%")
三、逻辑回归的优化策略
3.1 特征缩放与正则化
3.1.1 特征缩放
特征缩放(如标准化或归一化)有助于加速梯度下降的收敛速度,避免因特征尺度差异导致的优化困难。在之前的实现中,已经对特征进行了标准化处理。
3.1.2 正则化
为了防止过拟合,可以在逻辑回归中引入正则化项。常见的正则化方法包括L1正则化(Lasso)和L2正则化(Ridge)。以L2正则化为例,损失函数修改为:
L(w,b)=−1N∑i=1N[yilog(y^i)+(1−yi)log(1−y^i)]+λ∥w∥22 L(\mathbf{w}, b) = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right] + \lambda \|\mathbf{w}\|^2_2 L(w,b)=−N1i=1∑N[yilog(y^i)+(1−yi)log(1−y^i)]+λ∥w∥22
其中,λ\lambdaλ 是正则化强度参数。
修改后的训练方法如下:
python
class LogisticRegressionCustom:
# ... (其他部分保持不变)
def train(self, X, y, lambda_reg=0.01):
n_samples, n_features = X.shape
self.initialize_parameters(n_features)
for i in range(self.num_iterations):
# 预测
y_pred = self.predict_proba(X)
# 计算梯度
dw = np.dot(X.T, (y_pred - y)) / n_samples + (lambda_reg * self.weights)
db = np.sum(y_pred - y) / n_samples
# 更新参数
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db
# 打印损失
if (i+1) % 100 == 0:
loss = log_loss(y, y_pred) + lambda_reg * np.sum(self.weights ** 2)
print(f"Iteration {i+1}/{self.num_iterations} - Loss: {loss:.4f}")
3.2 学习率调度与早停机制
3.2.1 学习率调度
动态调整学习率可以帮助模型在训练初期快速接近最优解,同时在后期细致调整以避免震荡。一种常见的方法是随着迭代次数增加而减小学习率。例如,可以使用时间衰减策略:
αt=α01+ηt \alpha_{t} = \frac{\alpha_{0}}{1 + \eta t} αt=1+ηtα0
其中,α0\alpha_{0}α0 是初始学习率,η\etaη 是衰减率,ttt 是当前迭代次数。
3.2.2 早停机制
早停(Early Stopping)是一种防止过拟合的有效策略。通过在验证集上监控模型性能,当性能不再提升时提前终止训练。实现早停需要在训练过程中划分出验证集,并记录验证损失的变化。
python
class LogisticRegressionWithEarlyStopping:
def __init__(self, learning_rate=0.01, num_iterations=1000, decay=0.001, early_stopping=True, patience=5):
self.learning_rate = learning_rate
self.num_iterations = num_iterations
self.decay = decay
self.early_stopping = early_stopping
self.patience = patience
self.weights = None
self.bias = None
self.best_loss = np.inf
self.counter = 0
def initialize_parameters(self, n_features):
self.weights = np.zeros(n_features)
self.bias = 0.0
def predict_proba(self, X):
z = np.dot(X, self.weights) + self.bias
return sigmoid(z)
def predict(self, X):
proba = self.predict_proba(X)
return (proba >= 0.5).astype(int)
def train(self, X, y, X_val=None, y_val=None, lambda_reg=0.01):
n_samples, n_features = X.shape
self.initialize_parameters(n_features)
for i in range(self.num_iterations):
# 预测
y_pred = self.predict_proba(X)
# 计算梯度
dw = np.dot(X.T, (y_pred - y)) / n_samples + (lambda_reg * self.weights)
db = np.sum(y_pred - y) / n_samples
# 更新参数
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db
# 学习率衰减
self.learning_rate /= (1 + self.decay * i)
# 计算损失
if X_val is not None and y_val is not None:
y_val_pred = self.predict_proba(X_val)
val_loss = log_loss(y_val, y_val_pred) + lambda_reg * np.sum(self.weights ** 2)
if val_loss < self.best_loss:
self.best_loss = val_loss
self.counter = 0
else:
self.counter += 1
if self.early_stopping and self.counter >= self.patience:
print(f"Early stopping at iteration {i+1}")
break
if (i+1) % 100 == 0:
train_loss = log_loss(y, y_pred) + lambda_reg * np.sum(self.weights ** 2)
print(f"Iteration {i+1}/{self.num_iterations} - Train Loss: {train_loss:.4f}", end='')
if X_val is not None and y_val is not None:
print(f" | Val Loss: {val_loss:.4f}")
else:
print()
3.3 使用向量化与并行计算提升性能
在大规模数据集上,逐样本计算梯度会非常耗时。通过向量化操作,可以利用NumPy的高效矩阵运算显著提升计算速度。此外,对于极大数据集,可以考虑并行计算或使用GPU加速库(如CuPy)。以下是一个向量化实现的示例:
python
class LogisticRegressionVectorized:
def __init__(self, learning_rate=0.01, num_iterations=1000, lambda_reg=0.01):
self.learning_rate = learning_rate
self.num_iterations = num_iterations
self.lambda_reg = lambda_reg
self.weights = None
self.bias = None
def initialize_parameters(self, n_features):
self.weights = np.zeros(n_features)
self.bias = 0.0
def predict_proba(self, X):
z = np.dot(X, self.weights) + self.bias
return sigmoid(z)
def predict(self, X):
proba = self.predict_proba(X)
return (proba >= 0.5).astype(int)
def train(self, X, y):
n_samples, n_features = X.shape
self.initialize_parameters(n_features)
for i in range(self.num_iterations):
# 预测
y_pred = self.predict_proba(X)
# 计算梯度(向量化)
error = y_pred - y
dw = np.dot(X.T, error) / n_samples + (self.lambda_reg * self.weights)
db = np.sum(error) / n_samples
# 更新参数
self.weights -= self.learning_rate * dw
self.bias -= self.learning_rate * db
# 打印损失
loss = log_loss(y, y_pred) + self.lambda_reg * np.sum(self.weights ** 2)
if (i+1) % 100 == 0:
print(f"Iteration {i+1}/{self.num_iterations} - Loss: {loss:.4f}")