决策树有八个参数:Criterion,两个随机性相关的参数(random_state,splitter),五个剪枝参数(max_depth, min_samples_split,min_samples_leaf,max_feature,min_impurity_decrease)
一个属性:feature_importances_
四个接口:fit,score,apply,predict
样例:
python
import numpy as np
import pandas as pd
from PIL.ImageColor import colormap
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine
from sklearn import tree
wine = load_wine()
Xtrain, Xtest, Ytrain, Ytest = train_test_split(wine.data, wine.target, test_size=0.3)
clf = tree.DecisionTreeClassifier(criterion='entropy'
, random_state=30
, splitter='random'
, min_samples_split=10
, min_samples_leaf=10
, max_depth=10)
clf = clf.fit(Xtrain, Ytrain)
clf.feature_importances_
clf.apply(Xtest)
clf.predict(Xtest)
一、参数
1、criterion
参数默认gini,还有entropy,想要高拟合就用entropy
2、random_state
设置数字就是固定随机选择的种子,每次运行都一样
3、splitter
默认是"best",高拟合
担心拟合高了或特征太多就用"random"
4、max_depth
树生长的最大深度,通常是限制拟合过高的情况
5、min_samples_split,min_samples_leaf
要到min_samples_split个样本才会考虑继续分裂,分裂后的子节点不能少于min_samples_leaf
6、max_features
特征个数
二、属性
1、feature_importances_
显示每个特征的重要性
三、接口
1、fit
训练
2、score
正确率
3、predict
预测结果是哪一组
4、apply
叶子节点的索引