支持向量机

maxw,b2∣∣w∣∣⇔maxw,b∣∣w∣∣−1⇔min∣∣w∣∣⇔min12∣∣w∣∣2s.t yi(wTxi+b)≥1, i=1,2,3,⋯ ,m. \max_{w,b}\frac{2}{||w||} \\\Leftrightarrow \max_{w,b}||w|| ^{-1}\\\Leftrightarrow \min ||w||\\\Leftrightarrow \min \frac{1}{2}||w||^2 \\ s.t\ \ \ y_i(w^Tx_i+b)\geq1,\ \ \ i=1,2,3,\cdots,m. w,bmax∣∣w∣∣2⇔w,bmax∣∣w∣∣−1⇔min∣∣w∣∣⇔min21∣∣w∣∣2s.t yi(wTxi+b)≥1, i=1,2,3,⋯,m.
为了求解上面的不等式约束最小化问题,要使用SMO算法
首先获取对偶问题:
上面的优化问题可以等价为最小最大化拉格朗日函数:
minw,bmaxα≥0 L(w,b,α)=12∣∣w∣∣2+∑i=1mαi(1−yi(wTxi+b))αi≥0, i=1,2,⋯ ,m. \min_{w,b}\max_{\alpha\geq0}\ L(w,b,\alpha)=\frac{1}{2}||w||^2+\sum_{i=1}^m\alpha_i(1-y_i(w^Tx_i+b))\\ \alpha_i\geq0, \ \ \ i=1,2,\cdots,m. w,bminα≥0max L(w,b,α)=21∣∣w∣∣2+i=1∑mαi(1−yi(wTxi+b))αi≥0, i=1,2,⋯,m.
则令:∂L∂w=w−∑i=1mαiyixi=0∂L∂b=−∑i=1mαiyi=0即:w=∑i=1mαiyixi∑i=1mαiyi=0 则令:\\ \frac{\partial L}{\partial w}=w-\sum_{i=1}^m\alpha_iy_ix_i=0\\ \frac{\partial L}{\partial b}=-\sum_{i=1}^m\alpha_iy_i=0\\ 即:\\ w=\sum_{i=1}^m\alpha_iy_ix_i\\ \sum_{i=1}^m\alpha_iy_i=0 则令:∂w∂L=w−i=1∑mαiyixi=0∂b∂L=−i=1∑mαiyi=0即:w=i=1∑mαiyixii=1∑mαiyi=0
代回拉格朗日方程得:
L(w,b,α)=12∣∣w∣∣2+∑i=1mαi(1−yi(wTxi+b))=12wTw+∑i=1mαi(1−yi(wTxi+b))=12[∑i=1mαiyixi]T∑i=1mαiyixi+∑i=1mαi(1−yi(wTxi+b))=12∑i=1m∑j=1mαiαjyiyjxiTxj+∑i=1mαi(1−yi(wTxi+b))=12∑i=1m∑j=1mαiαjyiyjxiTxj+∑i=1mαi−∑i=1mαiyi(wTxi+b)=12∑i=1m∑j=1mαiαjyiyjxiTxj+∑i=1mαi−∑i=1mαiyi[∑i=1mαiyixi]Txi=12∑i=1m∑j=1mαiαjyiyjxiTxj+∑i=1mαi−∑i=1m∑j=1mαiαjyiyjxiTxj=∑i=1mαi−12∑i=1m∑j=1mαiαjyiyjxiTxj L(w,b,\alpha)=\frac{1}{2}||w||^2+\sum_{i=1}^m\alpha_i(1-y_i(w^Tx_i+b))\\ =\frac{1}{2} w^Tw+\sum_{i=1}^m\alpha_i(1-y_i(w^Tx_i+b))\\ =\frac{1}{2}[\sum_{i=1}^m\alpha_iy_ix_i]^T\sum_{i=1}^m\alpha_iy_ix_i+\sum_{i=1}^m\alpha_i(1-y_i(w^Tx_i+b))\\ =\frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m\alpha_i\alpha_jy_iy_jx_i^Tx_j+\sum_{i=1}^m\alpha_i(1-y_i(w^Tx_i+b))\\ =\frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m\alpha_i\alpha_jy_iy_jx_i^Tx_j+\sum_{i=1}^m\alpha_i-\sum_{i=1}^m\alpha_iy_i(w^Tx_i+b)\\ =\frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m\alpha_i\alpha_jy_iy_jx_i^Tx_j+\sum_{i=1}^m\alpha_i-\sum_{i=1}^m\alpha_iy_i[\sum_{i=1}^m\alpha_iy_ix_i]^Tx_i\\ =\frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m\alpha_i\alpha_jy_iy_jx_i^Tx_j+\sum_{i=1}^m\alpha_i-\sum_{i=1}^m\sum_{j=1}^m\alpha_i\alpha_jy_iy_jx_i^Tx_j\\ =\sum_{i=1}^m\alpha_i-\frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m\alpha_i\alpha_jy_iy_jx_i^Tx_j L(w,b,α)=21∣∣w∣∣2+i=1∑mαi(1−yi(wTxi+b))=21wTw+i=1∑mαi(1−yi(wTxi+b))=21[i=1∑mαiyixi]Ti=1∑mαiyixi+i=1∑mαi(1−yi(wTxi+b))=21i=1∑mj=1∑mαiαjyiyjxiTxj+i=1∑mαi(1−yi(wTxi+b))=21i=1∑mj=1∑mαiαjyiyjxiTxj+i=1∑mαi−i=1∑mαiyi(wTxi+b)=21i=1∑mj=1∑mαiαjyiyjxiTxj+i=1∑mαi−i=1∑mαiyi[i=1∑mαiyixi]Txi=21i=1∑mj=1∑mαiαjyiyjxiTxj+i=1∑mαi−i=1∑mj=1∑mαiαjyiyjxiTxj=i=1∑mαi−21i=1∑mj=1∑mαiαjyiyjxiTxj
即原问题为:
maxα≥0 L(w,b,α)=∑i=1mαi−12∑i=1m∑j=1mαiαjyiyjxiTxjs.t. ∑i=1mαiyi=0 \max_{\alpha\geq0}\ L(w,b,\alpha)=\sum_{i=1}^m\alpha_i-\frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m\alpha_i\alpha_jy_iy_jx_i^Tx_j\\ s.t.\ \ \sum_{i=1}^m\alpha_iy_i=0 α≥0max L(w,b,α)=i=1∑mαi−21i=1∑mj=1∑mαiαjyiyjxiTxjs.t. i=1∑mαiyi=0
接下来就是使用SMO算法解出αi\alpha_iαi
且由KKT条件可知:
αi≥0yif(xi)−1≥0αi(yif(xi)−1)=0 \alpha_i\geq0\\ y_if(x_i)-1\geq0\\ \alpha_i(y_if(x_i)-1)=0 αi≥0yif(xi)−1≥0αi(yif(xi)−1)=0
可知αi=0\alpha_i=0αi=0时,该样本不对w起作用,αi>0\alpha_i>0αi>0时,有yif(xi)=1y_if(x_i)=1yif(xi)=1,该样本一定是支持向量
然后通过
w=∑i=1mαiyixi w=\sum_{i=1}^m\alpha_iy_ix_i w=i=1∑mαiyixi
得到w,知道了w再根据支持向量上有yif(xi)=yi(wTxi+b)=1y_if(x_i)=y_i(w^Tx_i+b)=1yif(xi)=yi(wTxi+b)=1解出b
接下来就是介绍SMO算法的实现:
每次选取违反KKT条件的αj\alpha_jαj和使得∣Ei−Ej∣|E_i-E_j|∣Ei−Ej∣最大的αi\alpha_iαi,其他固定
更新公式是:
(1) αinew=αiold+yi(Ej−Ei)η(2) αjnew=αjold+yiyj(αiold−αinew)Ei=f(xi)−yi,η=K(xi,xi)+K(xj,xj)−2K(xi,xj) (1)\ \ \alpha_i^{new}=\alpha_i^{old}+\frac{y_i(E_j-E_i)}{\eta}\\ (2)\ \ \ \alpha_j^{new}=\alpha_j^{old}+y_iy_j(\alpha_i^{old}-\alpha_i^{new})\\ E_i=f(x_i)-y_i,\eta=K(x_i,x_i)+K(x_j,x_j)-2K(x_i,x_j) (1) αinew=αiold+ηyi(Ej−Ei)(2) αjnew=αjold+yiyj(αiold−αinew)Ei=f(xi)−yi,η=K(xi,xi)+K(xj,xj)−2K(xi,xj)
(2)是因为:
∑i=1mαiyi=αiyi+αjyj+∑k∈Sαkyk=αiyi+αjyj−C=0S={1,2,⋯ ,m}, i,j∉S \sum_{i=1}^m\alpha_iy_i=\alpha_iy_i+\alpha_jy_j+\sum_{k\in S}\alpha_ky_k=\alpha_iy_i+\alpha_jy_j-C=0\\ S=\{1,2,\cdots, m \},\ \ i,j\notin S i=1∑mαiyi=αiyi+αjyj+k∈S∑αkyk=αiyi+αjyj−C=0S={1,2,⋯,m}, i,j∈/S
有:
αiyi+αjyj=C即(1) αioldyi+αjoldyj=C(2) αinewyi+αjnewyj=Cyj((1)−(2)):(αiold−αinew)yjyi+(αjold−αjnew)yjyj=0即:αjnew=(αiold−αinew)yjyi+αjold \alpha_iy_i+\alpha_jy_j=C\\ 即\\ (1)\ \ \alpha_i^{old}y_i+\alpha_j^{old}y_j=C\\ (2)\ \ \alpha_i^{new}y_i+\alpha_j^{new}y_j=C\\ y_j((1)-(2)):\\ (\alpha_i^{old}-\alpha_i^{new})y_jy_i+(\alpha_j^{old}-\alpha_j^{new})y_jy_j=0\\ 即:\alpha_j^{new}=(\alpha_i^{old}-\alpha_i^{new})y_jy_i+\alpha_j^{old} αiyi+αjyj=C即(1) αioldyi+αjoldyj=C(2) αinewyi+αjnewyj=Cyj((1)−(2)):(αiold−αinew)yjyi+(αjold−αjnew)yjyj=0即:αjnew=(αiold−αinew)yjyi+αjold