机器学习(七) — 决策树

model 4 --- decision tree

1 decision tree

1. component

usage: classification

  1. root node
  2. decision node

2. choose feature on each node

maximize purity (minimize inpurity)

3. stop splitting

  1. a node is 100% on class
  2. splitting a node will result in the tree exceeding a maximum depth
  3. improvement in purity score are below a threshold
  4. number of examples in a node is below a threshold

2 meature of impurity

use entropy( H H H) as a meature of impurity

H ( p ) = − p l o g 2 ( p ) − ( 1 − p ) l o g 2 ( 1 − p ) n o t e : 0 l o g 0 = 0 H(p) = -plog_2(p) - (1-p)log_2(1-p)\\ note: 0log0 = 0 H(p)=−plog2(p)−(1−p)log2(1−p)note:0log0=0

3 information gain

1. definition

i n f o m a t i o n _ g a i n = H ( p r o o t ) − ( w l e f t H ( p l e f t ) + w r i g h t H ( p r i g h t ) ) infomation\_gain = H(p^{root}) - (w^{left}H(p^{left}) + w^{right}H(p^{right})) infomation_gain=H(proot)−(wleftH(pleft)+wrightH(pright))

2. usage

  1. meature the reduction in entropy
  2. a signal of stopping splitting

3. continuous

find the threshold that has the most infomation gain

4 random forest

  1. generating a tree sample
复制代码
given training set of size m
for b = 1 to B:
	use sampling with replacement to create a new training set of size m
	train a decision tree on the training set
  1. randomizing the feature choice: at each node, when choosing a feature to use to split, if n features is available, pick a random subset of k < n(usually k = n k = \sqrt{n} k=n ) features and alow the algorithm to only choose from that subset of features
相关推荐
rit84324996 分钟前
基于高斯混合模型(GMM)的语音识别系统:MATLAB实现与核心原理
人工智能·matlab·语音识别
容智信息9 分钟前
Hyper Agent:企业级Agentic架构怎么实现?
人工智能·信息可视化·自然语言处理·架构·自动驾驶·智慧城市
Julyers11 分钟前
【Paper】FRST(快速径向对称变换)算法
图像处理·人工智能·计算机视觉·圆检测
Bony-17 分钟前
驾驶员行为检测:基于卷积神经网络(CNN)的识别方法
人工智能·神经网络·cnn
fie888920 分钟前
基于蚁群算法求解带时间窗的车辆路径问题
数据库·人工智能·算法
dazzle21 分钟前
计算机视觉处理(OpenCV基础教学(十七):图像轮廓检测技术详解)
人工智能·opencv·计算机视觉
人工智能技术咨询.23 分钟前
CLIP 的双编码器架构是如何优化图文关联的?
人工智能
珂朵莉MM31 分钟前
2025年睿抗机器人开发者大赛CAIP-编程技能赛-高职组(国赛)解题报告 | 珂学家
java·开发语言·人工智能·算法·机器人
猫头虎35 分钟前
Claude Code 永动机:ralph-loop 无限循环迭代插件详解(安装 / 原理 / 最佳实践 / 避坑)
ide·人工智能·langchain·开源·编辑器·aigc·编程技术
aigcapi39 分钟前
如何让AI推广我的品牌?成长期企业GEO优化的“降本增效”实战指南
人工智能