支持向量机（SVM）分类

支持向量机（Support Vector Machine，SVM）是一种经典的监督学习算法，主要用于分类任务，也可扩展到回归问题（称为支持向量回归，SVR）。其核心思想是通过寻找一个最优超平面，最大化不同类别数据之间的间隔（Margin），从而实现高效分类。

一、核心思想

SVM的目标是找到一个决策边界（超平面），将不同类别的数据分开，并确保该边界到最近数据点（支持向量）的距离最大。这种"最大化间隔"的策略使得模型具有更好的泛化能力。

超平面（Hyperplane）：

在n维空间中，一个超平面是n-1维的子空间。对于二维数据，超平面是一条直线；三维数据中是一个平面。

支持向量（Support Vectors）：

距离最优超平面最近的样本点称为支持向量，它们是决定超平面位置的关键样本。其他样本的位置对超平面无影响，这也是"SVM"名称的由来。

间隔（Margin）：

超平面到两类最近支持向量的距离之和。SVM的目标是最大化间隔。设超平面方程为 <math xmlns="http://www.w3.org/1998/Math/MathML"> w ⋅ x + b = 0 w\cdot x+b=0 </math>w⋅x+b=0（其中 <math xmlns="http://www.w3.org/1998/Math/MathML"> w w </math>w是权重向量， <math xmlns="http://www.w3.org/1998/Math/MathML"> b b </math>b是偏置），则单个样本点 <math xmlns="http://www.w3.org/1998/Math/MathML"> x i x_i </math>xi到超平面的距离为：

<math xmlns="http://www.w3.org/1998/Math/MathML"> 距离 = ∣ w ⋅ x i + b ∣ ∣ ∣ w ∣ ∣ 距离=\frac{\left| w\cdot x_i+b \right|}{\left| \left| w \right| \right|} </math>距离=∣∣w∣∣∣w⋅xi+b∣

最优超平面需满足：对于正类样本，有 <math xmlns="http://www.w3.org/1998/Math/MathML"> w ⋅ x i + b ≥ 1 w\cdot x_i+b\geq1 </math>w⋅xi+b≥1；对于负类样本，有 <math xmlns="http://www.w3.org/1998/Math/MathML"> w ⋅ x i + b ≤ − 1 w\cdot x_i+b\leq-1 </math>w⋅xi+b≤−1 。此时，间隔为 <math xmlns="http://www.w3.org/1998/Math/MathML"> 2 ∣ ∣ w ∣ ∣ \frac{2}{\left| \left| w \right| \right|} </math>∣∣w∣∣2，最大化间隔等价于最小化 <math xmlns="http://www.w3.org/1998/Math/MathML"> ∣ ∣ w ∣ ∣ 2 \left| \left| w \right| \right|^{2} </math>∣∣w∣∣2。

二、线性可分情况（硬间隔SVM）

假设数据线性可分，SVM的优化问题可表示为

<math xmlns="http://www.w3.org/1998/Math/MathML"> min ⁡ w , b 1 2 ∣ ∣ w ∣ ∣ 2 \min_{w,b}{\frac{1}{2}\left| \left| w \right| \right|^{2}} </math>minw,b21∣∣w∣∣2 s.t. <math xmlns="http://www.w3.org/1998/Math/MathML"> y i ( w ⋅ x i + b ) ≥ 1 ( ∀ i ) y_i(w\cdot x_i+b)\geq1 \quad (\forall i) </math>yi(w⋅xi+b)≥1(∀i)

目标：最小化 <math xmlns="http://www.w3.org/1998/Math/MathML"> ∣ ∣ w ∣ ∣ \left| \left| w \right| \right| </math>∣∣w∣∣（等价于最大化间隔 <math xmlns="http://www.w3.org/1998/Math/MathML"> 2 ∣ ∣ w ∣ ∣ \frac{2}{\left| \left| w \right| \right|} </math>∣∣w∣∣2）。

约束：确保所有样本被正确分类且位于间隔边界之外。

三、非线性可分情况（软间隔SVM）

当样本无法被线性超平面分隔时，SVM 通过以下方法处理：

1. 引入松弛变量（Slack Variables）

允许部分样本跨越超平面，但需在优化目标中加入惩罚项（即正则化参数 <math xmlns="http://www.w3.org/1998/Math/MathML"> C C </math>C），平衡间隔最大化和分类错误最小化

<math xmlns="http://www.w3.org/1998/Math/MathML"> min ⁡ w , b 1 2 ∣ ∣ x ∣ ∣ 2 + C ∑ i ξ i \min_{w,b}{\frac{1}{2}\left| \left| x \right| \right|^{2}}+C\sum_{i}{\xi_i} </math>minw,b21∣∣x∣∣2+C∑iξi s.t. <math xmlns="http://www.w3.org/1998/Math/MathML"> y i ( w ⋅ x i + b ) ≥ 1 − ξ i , ξ i ≥ 0 y_i(w\cdot x_i+b)\geq 1-\xi_i,\quad \xi_i\geq0 </math>yi(w⋅xi+b)≥1−ξi,ξi≥0

<math xmlns="http://www.w3.org/1998/Math/MathML"> C C </math>C的作用：控制分类错误的惩罚力度。 <math xmlns="http://www.w3.org/1998/Math/MathML"> C C </math>C越大，模型越严格（可能过拟合）； <math xmlns="http://www.w3.org/1998/Math/MathML"> C C </math>C越小，允许更多错误（可能欠拟合）。

2. 核技巧（Kernel Trick）

对于非线性可分数据，SVM通过核函数将原始空间映射到高维特征空间，使数据在新空间中线性可分。常见核函数有

线性核： <math xmlns="http://www.w3.org/1998/Math/MathML"> K ( x i , x j ) = x i ⋅ x j K(x_i,x_j)=x_i\cdot x_j </math>K(xi,xj)=xi⋅xj

多项式核： <math xmlns="http://www.w3.org/1998/Math/MathML"> K ( x i , x j ) = ( x i ⋅ x j + c ) d K(x_i,x_j)=(x_i\cdot x_j+c)^{d} </math>K(xi,xj)=(xi⋅xj+c)d

高斯径向基核（RBF）： <math xmlns="http://www.w3.org/1998/Math/MathML"> K ( x i , x j ) = e x p ( − γ ∣ ∣ x i − x j ∣ ∣ 2 ) K(x_i,x_j)=exp(-\gamma \left| \left| x_i-x_j \right| \right|^{2}) </math>K(xi,xj)=exp(−γ∣∣xi−xj∣∣2)

Sigmoid核： <math xmlns="http://www.w3.org/1998/Math/MathML"> K ( x i , x j ) = t a n h ( α x i ⋅ x j + c ) K(x_i,x_j)=tanh(\alpha x_i\cdot x_j+c) </math>K(xi,xj)=tanh(αxi⋅xj+c)

四、优化与求解

SVM通常转化为对偶问题，利用拉格朗日乘子法求解：

<math xmlns="http://www.w3.org/1998/Math/MathML"> m a x α ∑ i α i − 1 2 ∑ i , j α i α j y i y j K ( x i , x j ) max_{\alpha}{\sum_{i}{\alpha_i}}-\frac{1}{2}\sum_{i,j}{\alpha_i\alpha_jy_iy_jK(x_i,x_j)} </math>maxα∑iαi−21∑i,jαiαjyiyjK(xi,xj) s.t. <math xmlns="http://www.w3.org/1998/Math/MathML"> 0 ≤ α i ≤ C , ∑ i α i y i = 0 0\leq\alpha_i\leq C,\sum_{i}{\alpha_iy_i=0} </math>0≤αi≤C,∑iαiyi=0

通过拉格朗日对偶性转化为对偶问题，优势在于：

a) 将高维空间中的内积运算转化为核函数计算（避免直接处理高维数据）；

b) 解的形式仅依赖于支持向量，计算效率更高。

五、Python实现示例

python 复制代码

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 加载鸢尾花数据集
iris = datasets.load_iris()
X = iris.data  # 特征
y = iris.target  # 标签

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# 创建SVM分类器
clf = SVC(kernel='linear')  # 使用线性核函数

# 训练模型
clf.fit(X_train, y_train)

# 预测
y_pred = clf.predict(X_test)

# 评估模型
accuracy = accuracy_score(y_test, y_pred)
print(f"模型准确率: {accuracy:.2f}")

# 预测新样本
new_samples = [[5.1, 3.5, 1.4, 0.2], [6.3, 3.3, 4.7, 1.6]]
predictions = clf.predict(new_samples)
print(f"新样本预测结果: {[iris.target_names[p] for p in predictions]}")

End.