- 🍨 本文为🔗365天深度学习训练营 中的学习记录博客
- 🍖 原作者:K同学啊
一、决策树算法概述
二、代码实现
代码目标:我们希望通过鸢尾花数据,训练一个决策树模型,之后应用该模型,可以根据鸢尾花的四个特征去预测它的类别。
1. 分类树实现
python
import pandas as pd
import numpy as np
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['花萼-length', '花萼-width', '花瓣-length', '花瓣-width', 'class']
dataset = pd.read_csv(url, names=names)
dataset
输出:
python
X = dataset.iloc[ : ,[0,1,2,3]].values
Y = dataset.iloc[ : , 4].values
python
from sklearn import tree
from sklearn.datasets import load_iris
clf = tree.DecisionTreeClassifier() # sk-learn的决策树模型
clf = clf.fit(X, Y) # 用数据训练树模型构建()
r = tree.export_text(clf)
print(r)
输出:
python
text_x = X[[0,1,50,51,100,101], :]
pred_target_prob = clf.predict_proba(text_x) # 预测类别概率
pred_target = clf.predict(text_x) # 预测类别
python
print("\n===模型======")
print(r)
print("\n===测试数据:=====")
print(text_x)
print("\n===预测所属类别概率:=====")
print(pred_target_prob)
print("\n===预测所属类别:======")
print(pred_target)
输出:
2. 回归树实现
python
import pandas as pd
import numpy as np
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['花萼-length', '花萼-width', '花瓣-length', '花瓣-width', 'class']
dataset = pd.read_csv(url, names=names)
dataset
输出:
python
X = dataset.iloc[ : ,[0,1,2]].values
Y = dataset.iloc[ : , 3].values
python
from sklearn import tree
from sklearn.datasets import load_iris
clf = tree.DecisionTreeRegressor() # sk-learn的决策树模型
clf = clf.fit(X, Y) # 用数据训练树模型构建()
r = tree.export_text(clf)
python
test_x = X[[0,1,50,51,100,101], :]
test_y = Y[[0,1,50,51,100,101]]
pred_target = clf.predict(test_x) # 预测y
df = pd.DataFrame()
df["原y"] = test_y
df["预测y"] = pred_target
python
print("\n===模型======")
# print(r)
print("\n===预测结果======")
print(df)
输出:
三、总结
在使用决策树时,首先需确认分类及预测的对象,另外在处理缺失值时,也需注意。