学习automl

automl介绍

AutoGluon背后的技术_哔哩哔哩_bilibilihttps://www.bilibili.com/video/BV1F84y1F7Ps/?from=search&seid=1218935720421021430&spm_id_from=333.788.comment.all.click&vd_source=5252d3cdd5246bf9326ccfc5acb90644 AutoGluon背后的技术_哔哩哔哩_bilibili

论文：https://arxiv.org/abs/2003.06505

文档：https://auto.gluon.ai/

代码：https://github.com/awslabs/autogluon

之前房子竞赛:

竞赛地址：https://www.kaggle.com/c/california-house-prices/overview autogluon

文档：https://auto.gluon.ai/ autogluon

代码：https://github.com/awslabs/autogluon

集成学习

bagging的主要效果是能够并行,降低方差 eg:随机森林

特别是当整个用来做bagging的模型是不稳定的模型的时候效果最佳（随机森林） 之后求平均后软投票(推荐)/硬投票

boosting 它是说将多个弱一点的模型（偏差比较大）组合起来变成强一点的模型（偏差比较小），主要是为了去降低偏差而不是方差【Bagging 把多个不那么稳定的模型把它们放在一起得到一个相对稳定的模型】

Boosting是要按顺序的学习【bagging是每个模型是独立的】降低偏差

stacking:和bagging有点像降低方差

多层stacking(常用):降低偏差

Bagging和stacking区别(重点)

Layer 0（基模型层）--- 三者完全并行，互不影响

├── Model A (KNN) ──┐

├── Model B (SVM) ──┼── 各自做 out-of-fold 预测

└── Model C (DT) ──┘ │

▼ 拼接 → $ŷ_A, ŷ_B, ŷ_C$ ← 新特征

Layer 1（元模型层）─────────── 拿新特征训练 LR → 最终结果

	Bagging	Stacking
结构	扁平、并行	分层、串行
模型种类	同质为主	异质为主
聚合	硬投票 / 简单平均	元模型学习如何加权组合
降低什么	主要是方差	同时优化偏差 + 方差
训练成本	低~中，易并行	高，需 CV 防泄漏
风险	模型类型单一，天花板受限	容易过拟合，工程复杂度高
经典代表	Random Forest	Kaggle 竞赛常用套路

多层 Stacking 怎么实现？

核心原则（最重要）

每一层生成训练集上的预测时，都必须用 cross-validation 的 out-of-fold 方式，不可以直接 fit整个训练集再predict训练集自己------那叫数据泄露，会让多层 stacking 变成过拟合灾难。

例子:假设我们有训练集 X_train, y_train，测试集 X_test。

python 复制代码

from sklearn.model_selection import KFold
import numpy as np
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import Ridge
from sklearn.tree import DecisionTreeRegressor

# ---------- 工具函数：生成 out-of-fold 预测 ----------
def get_oof_predictions(model, X_train, y_train, X_test, n_folds=5):
    """
    返回:
      train_pred_oof : 和 y_train 一样长的向量, 每个样本的预测来自
                       没见过它的那个 fold 的模型
      test_pred      : 对 X_test 的预测, 取 k 个 fold 模型的平均
    """
    kf = KFold(n_splits=n_folds, shuffle=True, random_state=42)
    
    train_pred_oof = np.zeros(len(X_train))
    test_pred_folds = np.zeros((n_folds, len(X_test)))
    
    for i, (train_idx, val_idx) in enumerate(kf.split(X_train)):
        m = model.__class__(**model.get_params())  # 重新实例化
        m.fit(X_train[train_idx], y_train[train_idx])
        
        train_pred_oof[val_idx] = m.predict(X_train[val_idx])
        test_pred_folds[i] = m.predict(X_test)
    
    test_pred = test_pred_folds.mean(axis=0)
    return train_pred_oof, test_pred


# ---------- Layer 0 的三个基模型 ----------
models_l0 = [
    KNeighborsRegressor(n_neighbors=5),
    Ridge(alpha=1.0),
    DecisionTreeRegressor(max_depth=5, random_state=42),
]

# 收集每个模型的 OOF 预测 → 组成 Layer 0 的新特征
train_meta_0 = []
test_meta_0  = []

for m in models_l0:
    tr_p, te_p = get_oof_predictions(m, X_train, y_train, X_test)
    train_meta_0.append(tr_p)
    test_meta_0.append(te_p)

# 拼成新特征矩阵：(n_samples, 3)
X_train_l1 = np.column_stack(train_meta_0)
X_test_l1  = np.column_stack(test_meta_0)

第二层: