机器学习——逻辑回归

@[TOC]机器学习------逻辑回归

逻辑回归

逻辑回归定义：基于多元线性回归的分类算法，在多元线性回归的基础上，把结果缩放在0-1之间。

回归问题：例如线性回归，用于拟合方程，预测趋势、曲线走势等

分类问题：例如男女分类，猫狗大战等，用于解决类别区分问题

sigmoid函数

这里介绍一个特殊的函数，在深度学习中也称为激活函数，而在机器学习，或者说这篇博文中，起到的是一个分类器 的作用，其数学公式如下：
f ( x ) = 1 1 + e − x f(x)=\frac{1}{1+e^{-x}} f(x)=1+e−x1

其函数图像如下：

该函数具有以下特点：

Sigmoid函数的输出范围是0到1，并且函数是可微的，可以找到任意两个点的Sigmoid曲线的斜率
其导数可用自身进行表示：

f ′ ( x ) = e − x ( 1 + e − x ) 2 用 f ( x ) 表示，如下 f ′ = f ( x ) ( 1 − f ( x ) ) {f}'(x)=\frac{e^{-x}}{(1+e^{-x})^2}\\ 用f(x)表示，如下\\ {f}'=f(x)(1-f(x)) f′(x)=(1+e−x)2e−x用f(x)表示，如下f′=f(x)(1−f(x))

基于此函数的特性，可以将其变成概率函数 ：
h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x 其中 z = θ T x ------是线性回归方程 h_{\theta}(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}}\\ 其中z = \theta^Tx------是线性回归方程 hθ(x)=g(θTx)=1+e−θTx1其中z=θTx------是线性回归方程

而sigmoid函数作为逻辑回归中的分类器，其分界点 就在于x=0，即 z=θ^Tx=0 时，θ 的解。基于分界点则可以根据线性回归方程来判断其所属类别，z>0，为一类，z<0，为另一类。（注：这里讲的是二分类！！！）

逻辑回归求解过程

求结果过程如上图所示（以二分类为例），这里将正类分为1，负类分为0类，则有：
P r ( x = 1 ) = p p r ( x = 0 ) = 1 − p P_r(x=1)=p\\ p_r(x=0)=1-p Pr(x=1)=ppr(x=0)=1−p

x服从伯努利分布（0-1分布），则满足下面式子：
f ( x ∣ p ) = { p x ( 1 − p ) 1 − x x = 0 , 1 ( x 是类别 ) 0 x ≠ 0 , 1 即类别 − 概率： { x = 1 p x = 0 1 − p f(x|p)=\left\{\begin{matrix} p^x(1-p)^{1-x} & x=0,1(x是类别)\\ 0 &x\ne 0,1 \end{matrix}\right.\\ 即\\ 类别-概率：\left\{\begin{matrix} x=1 & p\\ x=0 & 1-p \end{matrix}\right.\\ f(x∣p)={px(1−p)1−x0x=0,1(x是类别)x=0,1即类别−概率：{x=1x=0p1−p

逻辑回归公式推导

损失函数推导：采用极大似然估计思想，根据若干已知的X,y（训练集），找到一组θ，使得X作为已知条件下，y发生的概率最大。
p ( y ∣ x ; θ ) = { h θ ( x ) y = 1 1 − h θ ( x ) y = 0 可整合为： p ( y ∣ x ; θ ) = ( h θ ( x ) ) y ( 1 − h θ ( x ) ) 1 − y h θ : s i g m o i d 函数 p(y|x;\theta)=\left\{\begin{matrix} h_{\theta}(x) & y=1\\ 1-h_{\theta}(x) & y=0 \end{matrix}\right.\\ 可整合为：\\ p(y|x;\theta)=(h_{\theta}(x))^y(1-h_{\theta}(x))^{1-y}\\ h_{\theta}:sigmoid函数 p(y∣x;θ)={hθ(x)1−hθ(x)y=1y=0可整合为：p(y∣x;θ)=(hθ(x))y(1−hθ(x))1−yhθ:sigmoid函数

假设训练样本相互独立，则用似然函数（越大越好）可以表示为：
L ( θ ) = ∏ i = 1 n P ( y i ∣ x i ; θ ) L ( θ ) = ∏ i = 1 n ( h θ ( x i ) ) y i ( 1 − h θ ( x i ) ) 1 − y i L(\theta)=\prod_{i=1}^{n}P(y^i|x^i;\theta)\\ L(\theta)=\prod_{i=1}^{n}(h_{\theta}(x^i))^{y^i}(1-h_{\theta}(x^i))^{1-y^i} L(θ)=i=1∏nP(yi∣xi;θ)L(θ)=i=1∏n(hθ(xi))yi(1−hθ(xi))1−yi

由于累乘计算对于计算机而言太耗费算力了，于是可以将累乘变成累加：
l ( θ ) = l n ( L ( θ ) ) = l n ( ∏ i = 1 n ( h θ ( x i ) ) y i ( 1 − h θ ( x i ) ) 1 − y i ) 化简可得下式------最大似然估计迭代更新公式： l ( θ ) = l n ( L ( θ ) ) = ∑ i = 1 n ( y i l n ( h θ ( x i ) ) + ( 1 − y i ) l n ( 1 − h θ ( x i ) ) 1 − y i ) ) l(\theta)=ln(L(\theta))=ln(\prod_{i=1}^{n}(h_{\theta}(x^i))^{y^i}(1-h_{\theta}(x^i))^{1-y^i})\\ 化简可得下式------最大似然估计迭代更新公式：\\ l(\theta)=ln(L(\theta))=\sum_{i=1}^{n}(y^iln(h_{\theta}(x^i))+(1-y^i)ln(1-h_{\theta}(x^i))^{1-y^i})) l(θ)=ln(L(θ))=ln(i=1∏n(hθ(xi))yi(1−hθ(xi))1−yi)化简可得下式------最大似然估计迭代更新公式：l(θ)=ln(L(θ))=i=1∑n(yiln(hθ(xi))+(1−yi)ln(1−hθ(xi))1−yi))

所以，损失函数 由最大似然函数可以写成：
J ( θ ) = − l ( θ ) = − ∑ i = 1 n ( y i l n ( h θ ( x i ) ) + ( 1 − y i ) l n ( 1 − h θ ( x i ) ) 1 − y i ) ) J(\theta)=-l(\theta)=-\sum_{i=1}^{n}(y^iln(h_{\theta}(x^i))+(1-y^i)ln(1-h_{\theta}(x^i))^{1-y^i})) J(θ)=−l(θ)=−i=1∑n(yiln(hθ(xi))+(1−yi)ln(1−hθ(xi))1−yi))

淡然，知道了损失函数，还需要知道如何使用它，并将它用在梯度下降中，因此，对损失函数进行求导(这里要用到sigmoid求导特性！ )，可得：

\\frac{\\alpha}{\\alpha \\theta_j}J(\\theta)=-\\sum_{i=1}^{n}\\frac{1}{h_{\\theta}(x^i)}\\frac{\\alpha}{\\alpha \\theta_j}h_{\\theta}(x^i)+(1-y^i)\\frac{1}{1-h_{\\theta}(x\^i)}\\frac{\\alpha}{\\alpha \\theta_j}(1-h_{\\theta}(x\^i))\\ 原式=-\\sum_{i=1}^{n}(y^i\\frac{1}{h_{\\theta}(x^i)}-(1-y^i)\\frac{1}{1-h_{\\theta}(x\^i)})\\frac{\\alpha}{\\alpha \\theta_j}h_{\\theta}(x\^i)\\ 原式=-\\sum_{i=1}^{n}(y^i\\frac{1}{h_{\\theta}(x^i)}-(1-y^i)\\frac{1}{1-h_{\\theta}(x^i)})h_{\\theta}(x^i)(1-h_{\\theta}(x\^i))\\frac{\\alpha}{\\alpha \\theta_j}\\theta\^Tx\\ 原式=-\\sum_{i=1}^{n}(y^i(1-h_{\\theta}(x^i))-(1-y^i)h_{\\theta}(x\^i))\\frac{\\alpha}{\\alpha \\theta_j}\\theta\^Tx\\ 原式=-\\sum_{i=1}^{n}(y^i-h_{\\theta}(x\^i))\\frac{\\alpha}{\\alpha \\theta_j}\\theta\^Tx\\ 原始=\\sum_{i=1}^{n}(h_{\\theta}(x^i)-y^i)x_j^i 综上，可得，逻辑回归损失函数参数更新迭代公式为： 综上，可得，逻辑回归损失函数参数更新迭代公式为： 综上，可得，逻辑回归损失函数参数更新迭代公式为： \\theta^{t+1}_j=\\theta^t_j-\\alpha\\sum_{i=1}^n(h_{\\theta}(x^i)-y^i)x^i_j

下面对sklearn中的乳腺癌数据集的前两个特征进行逻辑回归，然后根据产生的权重参数进行一定的数据化，生成损失函数的可视化图片，如下图：

代码如下：

python 复制代码

# 导入必要的工具包
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
from sklearn.preprocessing import scale,StandardScaler
from sklearn.linear_model import LogisticRegression

# 加载乳腺癌数据集
X, y = datasets.load_breast_cancer(return_X_y=True)
# 对数据集进行切片，取其前两个特征
X = X[:,:2]

# 对数据进行标准化，归一化
X = scale(X)

# 生成逻辑回归的构造器
model = LogisticRegression()
model.fit(X,y)
#print('模型的权重参数为：',model.coef_)
#print('模型的截距为：',model.intercept_)
w1 = model.coef_[0,0]
w2 = model.coef_[0,1]
b = model.intercept_

# 定义sigmoid函数
def sigmoid(X,w1,w2,b):
    z = w1 * X[0] + w2 * X[1] + b
    return 1 / (1 + np.exp(-z))

# 定义损失函数
def loss_function(X, y, w1, w2, b):
    loss = 0
    for x_i, y_i in zip(X,y):
        p = sigmoid(x_i,w1,w2,b)
        p =np.clip(p,0.001,0.999) # 裁剪，避免出现分母为零的情况
        loss += -(y_i * np.log(p) + (1 - y_i)*np.log(1-p)) 
    return loss

# 定义w1和w2的取值空间
w1_space = np.linspace(w1 - 2, w1 + 2,100)
w2_space = np.linspace(w2 - 2, w2 + 2,100)
#display(w1_space,w2_space)

# 计算损失函数
loss1_ = np.array([loss_function(X,y,i,w2,b) for i in w1_space])
loss2_ = np.array([loss_function(X,y,w1,i,b) for i in w2_space])

# 可视化操作
figure1 = plt.figure(figsize=(12,9))
# 子视图1 ------  W1与损失函数的关系
plt.subplot(2,2,1)
plt.plot(w1_space,loss1_,color = 'red')
plt.title('W1')
# 子视图2 ------  W2与损失函数的关系
plt.subplot(2,2,2)
plt.plot(w2_space,loss2_,color = 'green')
plt.title('W2')
# 子视图3 ------  等高线
plt.subplot(2,2,3)
w1_grid,w2_grid = np.meshgrid(w1_space,w2_space)
loss_grid = loss_function(X, y, w1_grid, w2_grid, b)
plt.contour(w1_grid,w2_grid,loss_grid,color = 'red')
# 子视图4 ------  等高面
plt.subplot(2,2,4)
plt.contourf(w1_grid,w2_grid,loss_grid,color = 'green')

生成3D可视化图片，如下：

生成代码在上述代码执行后，在执行下属代码即可：

python 复制代码

# 3D可视化
fig2 = plt.figure(figsize=(12,9))
ax = Axes3D(fig2)
ax.plot_surface(w1_grid,w2_grid,loss_grid,cmap='viridis')
plt.xlabel('w1')
plt.ylabel('w2')
ax.view_init(30,-30)

逻辑回归解决多分类问题

One-VS-Rest

思想：把一个多分类问题，变成多个二分类问题，即在多分类中选择一个类别为正类，其他类别都当成负类。

优点：普适性广，可以应用于能输出值或概率的分类器，同时效率相对较好，有多少个类别就训练多少个分类器。

缺点：很容易造成训练集样本数量不平衡的问题，容易造成分类器的偏向性。

python 复制代码

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# 创建数据
X,y = datasets.load_iris(return_X_y= True)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)

# 搭建模型，训练和预测(这里选择多分类是，使用multi_class='ovr'即可，默认是softmax)
model = LogisticRegression(multi_class='ovr')
model.fit(X_train,y_train)

y_pred = model.predict(X_test)
print('预测结果为：',y_pred[:10])
print('真实结果为：',y_test[:10])
print('准确率为：',model.score(X_test,y_test))
print('准确率为：',accuracy_score(y_pred,y_test))

softmax多分类

思想：将一个实数向量映射为概率分布，使得每个元素的取值范围在 (0, 1) 之间，并且所有元素的和为 1

其公式为：
S i ( x ) = e x i ∑ i = 1 n e x i S_i(x)=\frac{e^{x_i}}{\sum_{i=1}^{n}e^{x_i}} Si(x)=∑i=1nexiexi

假设类别y有k种，则回归方程可写成：
h θ ( x ) = { e θ 1 T x ∑ j = 1 k e j T x y = 1 e θ 2 T x ∑ j = 1 k e j T x y = 2 . . . . . . e θ k T x ∑ j = 1 k e j T x y = k } h_{\theta}(x)=\begin{Bmatrix} \frac{e^{\theta_1^Tx}}{\sum_{j=1}^{k}e^T_jx }&y=1 \\ \frac{e^{\theta_2^Tx}}{\sum_{j=1}^{k}e^T_jx } &y=2 \\ ...& ...\\ \frac{e^{\theta_k^Tx}}{\sum_{j=1}^{k}e^T_jx }&y=k \end{Bmatrix} hθ(x)=⎩ ⎨ ⎧∑j=1kejTxeθ1Tx∑j=1kejTxeθ2Tx...∑j=1kejTxeθkTxy=1y=2...y=k⎭ ⎬ ⎫

根据公式，有下图，描述了逻辑回归中softmax的工作原理：

根据原理图和数学公式可知，党softmax进行二分类时，就是简单的逻辑回归。

python 复制代码

# 测试softmax
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# 创建数据
X,y = datasets.load_iris(return_X_y= True)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=1024)

model = LogisticRegression(multi_class='multinomial')
model.fit(X_train,y_train)

print('模型的预测准确率为：',model.score(X_test,y_test))

print('模型预测测试数据概率为',model.predict_proba(X_test)[:5])