How to grow a Decision Tree
source : [3](3.html)
LearnUnprunedTree(X ,Y)
Input: X a matrix of R rows and M columns where X{ }{}{~}ij{~} = the value of the j 'th attribute in the i 'th input datapoint. Each column consists of either all real values or all categorical values. Input: Y a vector of R elements, where Y{ }{}{~}i{~} = the output class of the i 'th datapoint. The Y{ }{}{~}i{~} values are categorical. Output: An Unpruned decision tree
If all records in X have identical values in all their attributes (this includes the case where R<2 ), return a Leaf Node predicting the majority output, breaking ties randomly. This case also includes If all values in Y are the same, return a Leaf Node predicting this value as the output Else select m variables at random out of the M variables For j = 1 .. m If j 'th attribute is categorical IG{ }{}{~}j{~} = IG(Y |X{ }{}{~}j{~} ) (see Information Gain) Else (j 'th attribute is real-valued) IG{ }{}{~}j{~} = IG*(* Y*|* X{}{ }{~}j{~}) (see Information Gain) Let *j* = argmax{~}j~ IG{ }{}{~}j{~} (this is the splitting attribute we'll use) If j* is categorical then For each value v of the j 'th attribute Let X{ }{}{^}v{^} = subset of rows of X in which X{ }{}{~}ij{~} = v . Let Y{ }{}{^}v{^} = corresponding subset of Y Let Child{ }{}{^}v{^} = LearnUnprunedTree(X{ }{}{^}v{^} ,Y{ }{}{^}v{^} ) Return a decision tree node, splitting on j 'th attribute. The number of children equals the number of values of the j 'th attribute, and the v 'th child is Child{ }{}{^}v{^} Else j* is real-valued and let t be the best split threshold Let X{ }{}{^}LO{^} = subset of rows of X in which X{ }{}{~}ij{~} <= t . Let Y{ }{}{^}LO{^} = corresponding subset of Y Let Child{ }{}{^}LO{^} = LearnUnprunedTree(X{ }{}{^}LO{^} ,Y{ }{}{^}LO{^} ) Let X{ }{}{^}HI{^} = subset of rows of X in which X{ }{}{~}ij{~} > t . Let Y{ }{}{^}HI{^} = corresponding subset of Y Let Child{ }{}{^}HI{^} = LearnUnprunedTree(X{ }{}{^}HI{^} ,Y{ }{}{^}HI{^} ) Return a decision tree node, splitting on j 'th attribute. It has two children corresponding to whether the j'th attribute is above or below the given threshold.
Note: There are alternatives to Information Gain for splitting nodes
Information gain
source : [3](3.html)
- h4. nominal attributes
suppose X can have one of m values V{~}1~,V{~}2~,...,V{~}m~ P(X=V{~}1~)=p{~}1~, P(X=V{~}2~)=p{~}2~,...,P(X=V{~}m~)=p{~}m~ H(X)= -sum{~}j=1{~}{^}m^ p{~}j~ log{~}2~ p{~}j~ (The entropy of X) H(Y|X=v) = the entropy of Y among only those records in which X has value v H(Y|X) = sum{~}j~ p{~}j~ H(Y|X=v{~}j~) IG(Y|X) = H(Y) - H(Y|X)
- h4. real-valued attributes
suppose X is real valued define IG(Y|X:t) as H(Y) - H(Y|X:t) define H(Y|X:t) = H(Y|X<t) P(X<t) + H(Y|X>=t) P(X>=t) define IG*(Y|X) = max{~}t~ IG(Y|X:t)
How to grow a Random Forest
source : [1](1.html)
Each tree is grown as follows:
- if the number of cases in the training set is N , sample N cases at random -but with replacement, from the original data. This sample will be the training set for the growing tree.
- if there are M input variables, a number m << M is specified such that at each node, m variables are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the forest growing.
- each tree is grown to its large extent possible. There is no pruning.
Random Forest parameters
source : [2](2.html) Random Forests are easy to use, the only 2 parameters a user of the technique has to determine are the number of trees to be used and the number of variables (m ) to be randomly selected from the available set of variables. Breinman's recommendations are to pick a large number of trees, as well as the square root of the number of variables for m.
How to predict the label of a case
Classify(node ,V ) Input: node from the decision tree, if node.attribute = j then the split is done on the j'th attribute
Input: V a vector of M columns where V{ }{}{~}j{~} = the value of the j 'th attribute. Output: label of V
If node is a Leaf then Return the value predicted by node
Else Let j = node.attribute If j is categorical then Let v = V{ }{}{~}j{~} Let child{ }{}{^}v{^} = child node corresponding to the attribute's value v Return Classify(child{ }{}{^}v{^} ,V)
Else j is real-valued Let t = node.threshold (split threshold) If Vj < t then Let child{ }{}{^}LO{^} = child node corresponding to (<t ) Return Classify(child{ }{}{^}LO{^} ,V ) Else Let child{ }{}{^}HI{^} = child node corresponding to (>=t ) Return Classify(child{ }{}{^}HI{^} ,V)
The out of bag (oob) error estimation
source : [1](1.html)
in random forests, there is no need for cross-validation or a separate test set to get an unbiased estimate of the test set error. It is estimated internally, during the run, as follows:
- each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases left of the bootstrap sample and not used in the construction of the kth tree.
- put each case left out in the construction of the kth tree down the kth{ }tree to get a classification. In this way, a test set classification is obtained for each case in about one-thrid of the trees. At the end of the run, take j to be the class that got most of the the votes every time case n was oob . The proportion of times that j is not equal to the true class of n averaged over all cases is the oob error estimate. This has proven to be unbiased in many tests.
Other RF uses
source : [1](1.html)
- variable importance
- gini importance
- proximities
- scaling
- prototypes
- missing values replacement for the training set
- missing values replacement for the test set
- detecting mislabeled cases
- detecting outliers
- detecting novelties
- unsupervised learning
- balancing prediction error Please refer to [1](1.html) for a detailed description
References
[1](1.html) Random Forests - Classification Description Random forests - classification description [2](2.html) B. Larivi�re & D. Van Den Poel, 2004. "Predicting Customer Retention and Profitability by Using Random Forests and Regression Forests Techniques," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 04/282, Ghent University, Faculty of Economics and Business Administration. Available online : Predicting Customer Retention and Profitability by Using Random Forests and Regression Forests Techniques [3](3.html) Decision Trees - Andrew W. Moore[4] http://www.cs.cmu.edu/\~awm/tutorials\[1\](1.html) [4](4.html) Information Gain - Andrew W. Moore http://www.cs.cmu.edu/~awm/tutorials
Copyright © 2014-2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
如何培育决策树
来源:[3](3.html)
学习未剪枝树(X ,Y)
输入:X 是一个由 R 行和M 列组成的矩阵,其中X{ }{ }{~}ij{~} = 第 i个输入数据点中第 j 个属性的值。每列由所有实数值或所有分类值组成。输入:Y 是一个由 R 个元素组成的向量,其中Y{ }{ }{~}i{~} = 第 i 个数据点的输出类。Y { }{ *}{~}i{~}*值为分类值。输出:未剪枝的决策树
如果X中的所有记录在其所有属性中具有相同的值(包括 R <2 的情况),则返回一个预测多数输出的叶节点,随机打破平局。这种情况还包括如果Y 中的所有值都相同,则返回一个叶节点,预测该值作为输出,否则从M 个变量中随机选择 m 个 变量对于j = 1 .. m ,如果第 j 个属性是分类属性 IG{ }{ }{~}j{~} = IG( Y | X{ }{ }{~}j{~} )(参见信息增益)否则(第 j 个属性是实值) IG{ }{ }{~}j{~} = IG ( Y | X{ }{ }{~}j{~} )(参见信息增益)让 *j* = argmax{~}j~ IG{ }{ }{~}j{~} (这是我们将使用的分裂属性)如果j*是分类属性,则对于第 j 个属性的每个值v ,让 X{ }{ }{^}v{^} = X 的行子集,其中X{ }{ }{~}ij{~} = v 。令Y{ }{ }{^}v{^} = Y 的对应子 集令Child{ }{ }{^}v{^} = LearnUnprunedTree( X{ }{ }{^}v{^} , Y{ }{ }{^}v{^} ) 返回一个决策树节点,在第 j 个属性上进行拆分。子节点的数量等于第j 个属性的值的数量,第v 个子节点为 Child{ }{ }{^}v{^} 否则j* 为实值,令t 为最佳拆分阈值令X{ }{ }{^}LO{^} = X 的行子集,其中X{ }{ }{~}ij{~} <= t 。令Y{ }{ }{^}LO{^} = Y 的对应子 集令Child{ }{ }{^}LO{^} = LearnUnprunedTree( X{ }{ }{^}LO{^} , Y{ }{ }{^}LO{^} ) 令X{ }{ }{^}HI{^} = X 的行子集 ,其中X{ }{ }{~}ij{~} > t 。令Y{ }{ } {^}HI{^} = Y 的对应子集 令Child{ }{ }{^}HI{^} = LearnUnprunedTree( X{ }{ }{^}HI{^} , Y{ }{ }{^}HI{^} ) 返回决策树节点,根据 第 j 个属性进行拆分。它有两个子节点,分别对应第j个属性是高于还是低于给定阈值。
注意:除了信息增益之外,还有其他方法可以分割节点
信息增益
来源:[3](3.html)
- h4. 名义属性
假设 X 可以具有 m 个值之一 V{~}1~,V{~}2~,...,V{~}m~ P(X=V{~}1~)=p{~}1~, P(X=V{~}2~)=p{~}2~,...,P(X=V{~}m~)=p{~}m~ H(X)= -sum{~}j=1{~}{^}m^ p{~}j~ log{~}2~ p{~}j~ (X 的熵) H(Y|X=v) = 仅在 X 具有值 v 的记录中 Y 的熵 H(Y|X) = sum{~}j~ p{~}j~ H(Y|X=v{~}j~) IG(Y|X) = H(Y) - H(Y|X)
- h4. 实值属性
假设 X 是实值,定义 IG(Y|X:t) 为 H(Y) - H(Y|X:t) 定义 H(Y|X:t) = H(Y|X<t) P(X<t) + H(Y|X>=t) P(X>=t) 定义 IG*(Y|X) = max{~}t~ IG(Y|X:t)
如何培育随机森林
来源:[1](1.html)
每棵树的生长方式如下:
- 如果训练集中的案例数为N ,则从原始数据中随机抽取N个案例(但有替换)。该样本将成为生长树的训练集。
- 如果有M 个 输入变量,则指定一个数字m << M ,这样在每个节点上,从M 中随机选择m 个变量,并使用这 m 个 变量中的最佳分割来分割节点。在森林生长过程中, m的值保持不变。
- 每棵树都尽可能地长大。没有修剪。
随机森林参数
来源:[2](2.html) 随机森林易于使用,该技术的用户只需确定 2 个参数:要使用的树的数量和从可用变量集中随机选择的变量数量 ( m )。 Breinman 的建议是选择大量树,以及变量数量的平方根( m ) 。
如何预测案件的标签
分类(节点 ,V )输入:决策树中的节点,如果 node.attribute = j ,则对第 j个属性进行拆分
输入:V 是一个M 列向量,其中 V{ }{ }{~}j{~} = 第 j 个属性的值。输出:V的标签
如果节点是叶子,则返回 节点预测的值
否则,让j = node.attribute 如果j 是分类的,则让v = V{ }{ }{~}j{~} 让child{ }{ }{^}v{^} = 对应于属性值v 的子节点 返回 Classify( child{ }{ }{^}v{^} , V )
否则j 为实值 设t = node.threshold (分割阈值) 如果 Vj < t 则 设child{ }{ }{^}LO{^} = 对应于(<t )的子节点 返回 Classify( child{ }{ }{^}LO{^} , V ) 否则 设child{ }{ }{^}HI{^} = 对应于(>=t )的子节点 返回 Classify( child{ }{ }{^}HI{^} , V )
袋外(oob)误差估计
来源:[1](1.html)
在随机森林中,不需要交叉验证或单独的测试集来获得测试集误差的无偏估计。在运行过程中,它会在内部进行估计,如下所示:
- 每棵树都是使用来自原始数据的不同引导样本构建的。大约三分之一的案例是引导样本中剩下的,没有用于构建第k棵树。
- 将构建第 k 棵树时遗漏的每个案例放到 第 k{ }树中,得到一个分类。这样,大约三分之一的树中的每个案例都会得到一个测试集分类。在运行结束时,取j作为每次案例 n 为oob 时获得大多数投票的类。所有案例中j 不等于n 的真实类别的次数的平均值就是oob 错误估计。这在许多测试中已被证明是无偏的。
其他射频用途
来源:[1](1.html)
- 变量重要性
- 基尼重要性
- 邻近
- 扩展
- 原型
- 训练集缺失值替换
- 测试集缺失值替换
- 检测错误标记的案例
- 检测异常值
- 发现新奇事物
- 无监督学习
- 平衡预测误差的详细说明请参考[1](1.html)
参考
[1](1.html) 随机森林 - 分类描述 Random forests - classification description [2](2.html) B. Larivi�re & D. Van Den Poel,2004 年。"使用随机森林和回归森林技术预测客户保留率和盈利能力,"比利时根特大学经济与工商管理学院工作论文 04/282,根特大学,经济与工商管理学院。在线获取:Predicting Customer Retention and Profitability by Using Random Forests and Regression Forests Techniques [3](3.html) 决策树 - Andrew W. Moore[4] http://www.cs.cmu.edu/\~awm/tutorials\[1\](1.html) [4](4.html) 信息增益 - Andrew W. Moore http://www.cs.cmu.edu/~awm/tutorials