【机器学习】集成学习算法之AdaBoost

文章目录

基本步骤
示例
- [生成第 1 棵决策树](#生成第 1 棵决策树)
- [生产第 2 棵决策树](#生产第 2 棵决策树)
- [生成第 T 棵决策树](#生成第 T 棵决策树)
- 加权投票
[sklearn 实现](#sklearn 实现)

基本步骤

首先，是初始化训练数据的权值分布 D 1 D_1 D1。假设有 m m m 个训练样本数据，则每一个训练样本最开始时，都被赋予相同的权值： w 1 = 1 m w_1 = \large \frac{1}{m} w1=m1，这样训练样本集的初始权值分布 D 1 ( i ) D_1(i) D1(i)：
D 1 ( i ) = w 1 = ( w 11 , ⋯ , w 1 m ) = ( 1 m , ⋯ , 1 m ) D_1(i) = w_1 = (w_{11}, \cdots, w_{1m}) = (\frac{1}{m}, \cdots, \frac{1}{m}) D1(i)=w1=(w11,⋯,w1m)=(m1,⋯,m1)

进行迭代 t = 1 , ⋯ , T t = 1, \cdots, T t=1,⋯,T；

选取一个当前误差最低的弱分类器 h t h_t ht 作为第 t t t 个基本分类器，并计算弱分类器 h t : X → { − 1 , 1 } h_t:X\rightarrow \{-1, 1\} ht:X→{−1,1}，该弱分类器在分布 D t D_t Dt 上的分类错误率为：
ϵ t = P ( h t ( x i ) ≠ y i ) = ∑ i = t n w t i I ( h t ( x i ) ≠ y i ) \epsilon_t = P(h_t(x_i) \neq y_i) = \sum ^n {i=t} w{ti} I(h_t(x_i) \neq y_i) ϵt=P(ht(xi)=yi)=i=t∑nwtiI(ht(xi)=yi) 其中，
I ( h t ( x i ) ≠ y i ) = { 1 h t ( x i ) ≠ y i 0 h t ( x i ) = y i I(h_t(x_i) \neq y_i) = \begin{cases} 1 & h_t(x_i) \neq y_i \\\\ 0 & h_t(x_i) = y_i \\ \end{cases} I(ht(xi)=yi)=⎩ ⎨ ⎧10ht(xi)=yiht(xi)=yi分类错误率应满足 0 < ϵ < 0.5 0 < \epsilon < 0.5 0<ϵ<0.5 ，

第 t t t 个弱分类器 h t h_t ht 的权重系数为：
α t = 1 2 l o g ( 1 − ϵ t ϵ t ) \alpha_t = \frac{1}{2} log\left(\frac{1 - \epsilon_t}{\epsilon_t}\right) αt=21log(ϵt1−ϵt)

并求出新权重 w t + 1 = ( w t + 1 , 1 , ⋯ , w t + 1 , m ) w_{t+1} = (w_{t+1,1}, \cdots, w_{t+1,m}) wt+1=(wt+1,1,⋯,wt+1,m)，其中：
w t + 1 , i = w t i e − α t y i h t ( x i ) = { w t i e α t h t ( x i ) ≠ y i w t i e − α t h t ( x i ) = y i w_{t+1,i} = w_{ti} e^{-\alpha_t y_i h_t(x_i)} = \begin{cases} w_{ti} e ^{\alpha_t} & h_t(x_i) \neq y_i \\\\ w_{ti} e ^{-\alpha_t} & h_t(x_i) = y_i \\ \end{cases} wt+1,i=wtie−αtyiht(xi)=⎩ ⎨ ⎧wtieαtwtie−αtht(xi)=yiht(xi)=yi

对新权重进行归一化 处理，其中 Z t Z_t Zt 为归一化常数 ，得出训练样本的权重分布 D t + 1 D_{t+1} Dt+1 为：

D t + 1 = w t + 1 Z t D_{t+1} = \frac{w_{t+1}}{Z_{t}} Dt+1=Ztwt+1 简化上述过程公式为：
D t + 1 = D t Z t × { e − α t h t ( x i ) ≠ y i e α t h t ( x i ) = y i = D t e − α t y h t ( x ) Z t \begin{aligned} D_{t+1} & = \frac{D_t}{Z_t} × \begin{cases} e^{-\alpha_t} & h_t(x_i) \neq y_i \\\\ e^{\alpha_t} & h_t(x_i) = y_i \\ \end{cases} \\\\ & = \frac{D_te^{{-\alpha_t y h_t(x)}}}{Z_t} \end{aligned} Dt+1=ZtDt×⎩ ⎨ ⎧e−αteαtht(xi)=yiht(xi)=yi=ZtDte−αtyht(x)

最后是集合策略。Adaboost分类采用的是加权表决法，构建基本分类器的线性组合：
f ( x ) = ∑ t = 1 T α t h t ( x ) f(x) = \sum ^T _{t=1} \alpha_t h_t(x) f(x)=t=1∑Tαtht(x)

通过符号函数 sign 的作用，得到一个最终的强分类器为：
H ( x ) = s i g n ( f ( x ) ) = s i g n ( ∑ t = 1 T α t h t ( x ) ) H(x) = sign(f(x)) = sign(\sum ^T _{t=1} \alpha_t h_t(x)) H(x)=sign(f(x))=sign(t=1∑Tαtht(x))

示例

考虑一个分类数据集

序号	X 1 X_1 X1	X 2 X_2 X2	Y Y Y
1	0	0	1
2	0.5	0.9	1
3	1	1.2	-1
4	1.2	0.7	-1
5	1.4	0.6	1
6	1.6	0.2	-1
7	1.7	0.4	1
8	2	0	1
9	2.2	0.1	-1
10	2.5	1	-1

生成第 1 棵决策树

(随机) 选择条件 x 2 ≤ 0.65 x_2 ≤ 0.65 x2≤0.65 生成第 1 棵决策树

在分布 D 1 = ( 0.1 , ⋅ ⋅ ⋅ , 0.1 ) T D_1 = (0.1, · · · , 0.1)^T D1=(0.1,⋅⋅⋅,0.1)T 下，计算分类错误率 ϵ = 0.3 ϵ = 0.3 ϵ=0.3，求出权重系数 α 1 \alpha_1 α1：
α 1 = 1 2 l o g ( 1 − ϵ ϵ ) = 0.184 α_1 = \frac{1}{2} log\left( \frac{1−ϵ} {ϵ} \right) = 0.184 α1=21log(ϵ1−ϵ)=0.184

再求出新权重 w 2 = ( w 2 , 1 , ⋯ , w 2 , 10 ) w_2 = (w_{2,1}, \cdots, w_{2,10}) w2=(w2,1,⋯,w2,10)，其中：
w 2 , i = { w 1 i e α 1 i f y ≠ y ^ w 1 i e − α 1 i f y = y ^ w_{2,i} = \begin{cases} w_{1i} e ^{\alpha_1} & if ~~ y \neq \hat y \\\\ w_{1i} e ^{-\alpha_1} & if ~~ y = \hat y \\ \end{cases} w2,i=⎩ ⎨ ⎧w1ieα1w1ie−α1if y=y^if y=y^

对求得的新权重进行归一化求出权重分布 D 2 D_2 D2：

X 1 X_1 X1	X 2 X_2 X2	Y Y Y	Y ^ \hat Y Y^	D 1 D_1 D1	w 2 w_2 w2	D 2 D_2 D2
0	0	1	1	0.1	0.083	0.088
0.5	0.9	1	-1	0.1	0.12	0.128
1	1.2	-1	-1	0.1	0.083	0.088
1.2	0.7	-1	-1	0.1	0.083	0.088
1.4	0.6	1	1	0.1	0.083	0.088
1.6	0.2	-1	1	0.1	0.12	0.128
1.7	0.4	1	1	0.1	0.083	0.088
2	0	1	1	0.1	0.083	0.088
2.2	0.1	-1	1	0.1	0.12	0.128
2.5	1	-1	-1	0.1	0.083	0.088

生产第 2 棵决策树

随机选择条件 x 1 ≤ 1.5 x_1 ≤ 1.5 x1≤1.5 生成第 2 棵决策树

在分布 D 2 = ( 0.088 , 0.128 , ⋅ ⋅ ⋅ , 0.088 ) T D_2 = (0.088, 0.128, · · · , 0.088)^T D2=(0.088,0.128,⋅⋅⋅,0.088)T 下，计算分类错误率 ϵ = 0.352 ϵ = 0.352 ϵ=0.352，求出权重系数 α 2 \alpha_2 α2：
α 2 = 1 2 l o g ( 1 − ϵ ϵ ) = 0.133 α_2 = \frac{1}{2} log\left( \frac{1−ϵ} {ϵ} \right) = 0.133 α2=21log(ϵ1−ϵ)=0.133

再求出新权重 w 3 w_3 w3，对 w 3 w_3 w3 进行归一化求出权重分布 D 3 D_3 D3：

X 1 X_1 X1	X 2 X_2 X2	Y Y Y	Y ^ \hat Y Y^	D 2 D_2 D2	w 3 w_3 w3	D 3 D_3 D3
0	0	1	1	0.088	0.077	0.079
0.5	0.9	1	1	0.128	0.112	0.115
1	1.2	-1	1	0.088	0.101	0.104
1.2	0.7	-1	1	0.088	0.101	0.104
1.4	0.6	1	1	0.088	0.077	0.079
1.6	0.2	-1	-1	0.128	0.112	0.115
1.7	0.4	1	-1	0.088	0.101	0.104
2	0	1	-1	0.088	0.101	0.104
2.2	0.1	-1	-1	0.128	0.112	0.115
2.5	1	-1	1	0.088	0.077	0.079

生成第 T 棵决策树

如此循环下去生成 T T T 棵决策树。

加权投票

通过加权投票的方式得到集成分类器：
F ( x ) = α 1 T r e e 1 + α 2 T r e e 2 + ⋯ + α t T r e e t = 0.184 I ( X 2 ≤ 0.65 ) + 0.133 I ( X 1 ≤ 1.5 ) + ⋯ + α t T r e e t \begin{aligned} F(x) & = α_1Tree_1 + α_2Tree_2 + \cdots + α_tTree_t \\\\ & = 0.184I(X_2 ≤ 0.65) + 0.133I(X_1 ≤ 1.5) + \cdots + α_tTree_t \end{aligned} F(x)=α1Tree1+α2Tree2+⋯+αtTreet=0.184I(X2≤0.65)+0.133I(X1≤1.5)+⋯+αtTreet

sklearn 实现

py 复制代码

import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import AdaBoostRegressor

# Create the dataset
X = np.array([[0, 0], [0.5, 0.9], [1, 1.2], [1.2, 0.7], [1.4, 0.6], [1.6, 0.2], [1.7, 0.4], [2, 0], [2.2, 0.1], [2.5, 1]])
y = np.array([1, 1, -1, -1, 1, -1, 1, 1, -1, -1])

# Fit the classifier
regr_1 = DecisionTreeRegressor(max_depth=3)
regr_2 = AdaBoostRegressor(regr_1, n_estimators=10, random_state=20)

regr_1.fit(X, y)
regr_2.fit(X, y)

# Score
core_1 = regr_1.score(X, y)
core_2 = regr_2.score(X, y)

print("Decision Tree score : %f" % core_1)
print("AdaBoost score : %f" % core_2)

# Predict
y_1 = regr_1.predict(X)
y_2 = regr_2.predict(X)

# Plot the results
x = range(10)
plt.figure()
plt.scatter(x, y, c="k", label="training samples")
plt.plot(x, y_1, c="g", label="n_estimators=1", linewidth=2)
plt.plot(x, y_2, c="r", label="n_estimators=20", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Boosted Decision Tree Regression")
plt.legend()
plt.show()

py 复制代码

# output
Decision Tree score : 0.733333
AdaBoost score : 1.000000

序号	X 1 X_1 X1	X 2 X_2 X2	Y Y Y
1	0	0	1
2	0.5	0.9	1
3	1	1.2	-1
4	1.2	0.7	-1
5	1.4	0.6	1
6	1.6	0.2	-1
7	1.7	0.4	1
8	2	0	1
9	2.2	0.1	-1
10	2.5	1	-1

序号	X 1 X_1 X1	X 2 X_2 X2	Y Y Y
1	0	0	1
2	0.5	0.9	1
3	1	1.2	-1
4	1.2	0.7	-1
5	1.4	0.6	1
6	1.6	0.2	-1
7	1.7	0.4	1
8	2	0	1
9	2.2	0.1	-1
10	2.5	1	-1

序号	X 1 X_1 X1	X 2 X_2 X2	Y Y Y
1	0	0	1
2	0.5	0.9	1
3	1	1.2	-1
4	1.2	0.7	-1
5	1.4	0.6	1
6	1.6	0.2	-1
7	1.7	0.4	1
8	2	0	1
9	2.2	0.1	-1
10	2.5	1	-1