采用mlxtend
可以很方便的计算Bias-Variance误差分解,下面是回归决策树方法的偏差-方差分解。
python
from mlxtend.evaluate import bias_variance_decomp
from sklearn.tree import DecisionTreeRegressor
from mlxtend.data import boston_housing_data
from sklearn.model_selection import train_test_split
X, y = boston_housing_data()
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3,
random_state=123,
shuffle=True)
tree = DecisionTreeRegressor(random_state=123)
avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(
tree, X_train, y_train, X_test, y_test,
loss='mse',
random_seed=123)
print('Average expected loss: %.3f' % avg_expected_loss)
print('Average bias: %.3f' % avg_bias)
print('Average variance: %.3f' % avg_var)
输出结果为:
sh
Average expected loss: 31.536
Average bias: 14.096
Average variance: 17.440
作为对比,下面是Bagging方法的偏差-方差,可以看出采用Bagging方法可以降低variance。
python
from sklearn.ensemble import BaggingRegressor
tree = DecisionTreeRegressor(random_state=123)
bag = BaggingRegressor(estimator=tree,
n_estimators=100,
random_state=123)
avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(
bag, X_train, y_train, X_test, y_test,
loss='mse',
random_seed=123)
print('Average expected loss: %.3f' % avg_expected_loss)
print('Average bias: %.3f' % avg_bias)
print('Average variance: %.3f' % avg_var)
输出结果为:
sh
Average expected loss: 18.620
Average bias: 15.460
Average variance: 3.159