How to grow a Decision Tree
source : [3](3.html)
LearnUnprunedTree(X ,Y)
Input: X a matrix of R rows and M columns where X{ }{}{~}ij{~} = the value of the j 'th attribute in the i 'th input datapoint. Each column consists of either all real values or all categorical values. Input: Y a vector of R elements, where Y{ }{}{~}i{~} = the output class of the i 'th datapoint. The Y{ }{}{~}i{~} values are categorical. Output: An Unpruned decision tree
If all records in X have identical values in all their attributes (this includes the case where R<2 ), return a Leaf Node predicting the majority output, breaking ties randomly. This case also includes If all values in Y are the same, return a Leaf Node predicting this value as the output Else select m variables at random out of the M variables For j = 1 .. m If j 'th attribute is categorical IG{ }{}{~}j{~} = IG(Y |X{ }{}{~}j{~} ) (see Information Gain) Else (j 'th attribute is real-valued) IG{ }{}{~}j{~} = IG*(* Y*|* X{}{ }{~}j{~}) (see Information Gain) Let *j* = argmax{~}j~ IG{ }{}{~}j{~} (this is the splitting attribute we'll use) If j* is categorical then For each value v of the j 'th attribute Let X{ }{}{^}v{^} = subset of rows of X in which X{ }{}{~}ij{~} = v . Let Y{ }{}{^}v{^} = corresponding subset of Y Let Child{ }{}{^}v{^} = LearnUnprunedTree(X{ }{}{^}v{^} ,Y{ }{}{^}v{^} ) Return a decision tree node, splitting on j 'th attribute. The number of children equals the number of values of the j 'th attribute, and the v 'th child is Child{ }{}{^}v{^} Else j* is real-valued and let t be the best split threshold Let X{ }{}{^}LO{^} = subset of rows of X in which X{ }{}{~}ij{~} <= t . Let Y{ }{}{^}LO{^} = corresponding subset of Y Let Child{ }{}{^}LO{^} = LearnUnprunedTree(X{ }{}{^}LO{^} ,Y{ }{}{^}LO{^} ) Let X{ }{}{^}HI{^} = subset of rows of X in which X{ }{}{~}ij{~} > t . Let Y{ }{}{^}HI{^} = corresponding subset of Y Let Child{ }{}{^}HI{^} = LearnUnprunedTree(X{ }{}{^}HI{^} ,Y{ }{}{^}HI{^} ) Return a decision tree node, splitting on j 'th attribute. It has two children corresponding to whether the j'th attribute is above or below the given threshold.
Note: There are alternatives to Information Gain for splitting nodes
Information gain
source : [3](3.html)
- h4. nominal attributes
suppose X can have one of m values V{~}1~,V{~}2~,...,V{~}m~ P(X=V{~}1~)=p{~}1~, P(X=V{~}2~)=p{~}2~,...,P(X=V{~}m~)=p{~}m~ H(X)= -sum{~}j=1{~}{^}m^ p{~}j~ log{~}2~ p{~}j~ (The entropy of X) H(Y|X=v) = the entropy of Y among only those records in which X has value v H(Y|X) = sum{~}j~ p{~}j~ H(Y|X=v{~}j~) IG(Y|X) = H(Y) - H(Y|X)
- h4. real-valued attributes
suppose X is real valued define IG(Y|X:t) as H(Y) - H(Y|X:t) define H(Y|X:t) = H(Y|X<t) P(X<t) + H(Y|X>=t) P(X>=t) define IG*(Y|X) = max{~}t~ IG(Y|X:t)
How to grow a Random Forest
source : [1](1.html)
Each tree is grown as follows:
- if the number of cases in the training set is N , sample N cases at random -but with replacement, from the original data. This sample will be the training set for the growing tree.
- if there are M input variables, a number m << M is specified such that at each node, m variables are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the forest growing.
- each tree is grown to its large extent possible. There is no pruning.
Random Forest parameters
source : [2](2.html) Random Forests are easy to use, the only 2 parameters a user of the technique has to determine are the number of trees to be used and the number of variables (m ) to be randomly selected from the available set of variables. Breinman's recommendations are to pick a large number of trees, as well as the square root of the number of variables for m.
How to predict the label of a case
Classify(node ,V ) Input: node from the decision tree, if node.attribute = j then the split is done on the j'th attribute
Input: V a vector of M columns where V{ }{}{~}j{~} = the value of the j 'th attribute. Output: label of V
If node is a Leaf then Return the value predicted by node
Else Let j = node.attribute If j is categorical then Let v = V{ }{}{~}j{~} Let child{ }{}{^}v{^} = child node corresponding to the attribute's value v Return Classify(child{ }{}{^}v{^} ,V)
Else j is real-valued Let t = node.threshold (split threshold) If Vj < t then Let child{ }{}{^}LO{^} = child node corresponding to (<t ) Return Classify(child{ }{}{^}LO{^} ,V ) Else Let child{ }{}{^}HI{^} = child node corresponding to (>=t ) Return Classify(child{ }{}{^}HI{^} ,V)
The out of bag (oob) error estimation
source : [1](1.html)
in random forests, there is no need for cross-validation or a separate test set to get an unbiased estimate of the test set error. It is estimated internally, during the run, as follows:
- each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases left of the bootstrap sample and not used in the construction of the kth tree.
- put each case left out in the construction of the kth tree down the kth{ }tree to get a classification. In this way, a test set classification is obtained for each case in about one-thrid of the trees. At the end of the run, take j to be the class that got most of the the votes every time case n was oob . The proportion of times that j is not equal to the true class of n averaged over all cases is the oob error estimate. This has proven to be unbiased in many tests.
Other RF uses
source : [1](1.html)
- variable importance
- gini importance
- proximities
- scaling
- prototypes
- missing values replacement for the training set
- missing values replacement for the test set
- detecting mislabeled cases
- detecting outliers
- detecting novelties
- unsupervised learning
- balancing prediction error Please refer to [1](1.html) for a detailed description
References
1\](1.html) Random Forests - Classification Description [Random forests - classification description](http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm "Random forests - classification description") \[2\](2.html) B. Larivi�re \& D. Van Den Poel, 2004. "Predicting Customer Retention and Profitability by Using Random Forests and Regression Forests Techniques," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 04/282, Ghent University, Faculty of Economics and Business Administration. Available online : [Predicting Customer Retention and Profitability by Using Random Forests and Regression Forests Techniques](http://ideas.repec.org/p/rug/rugwps/04-282.html "Predicting Customer Retention and Profitability by Using Random Forests and Regression Forests Techniques") \[3\](3.html) Decision Trees - Andrew W. Moore\[4\] http://www.cs.cmu.edu/\~awm/tutorials\[1\](1.html) \[4\](4.html) Information Gain - Andrew W. Moore [http://www.cs.cmu.edu/\~awm/tutorials](http://www.cs.cmu.edu/~awm/tutorials "http://www.cs.cmu.edu/~awm/tutorials")
Copyright © 2014-2024 [The Apache Software Foundation](http://www.apache.org/ "The Apache Software Foundation"), Licensed under the Apache License, Version 2.0.
#### 如何培育决策树
来源:\[3\](3.html)
学习未剪枝树(*X* ,*Y*)
输入:*X 是一个由* *R* 行和*M* 列组成的矩阵,其中*X{* }{ *}{\~}ij{\~} =* *第 i个输入数据点中第* *j* 个属性的值。每列由所有实数值或所有分类值组成。输入:*Y 是一个由* *R* 个元素组成的向量,其中*Y{* }{ *}{\~}i{\~} = 第* *i* 个数据点的输出类。Y *{* }{ *}{\~}i{\~}*值为分类值。输出:未剪枝的决策树
如果*X中的所有记录在其所有属性中具有相同的值(包括* *R \<2* 的情况),则返回一个预测多数输出的叶节点,随机打破平局。这种情况还包括如果*Y* 中的所有值都相同,则返回一个叶节点,预测该值作为输出,否则从*M 个变量中随机选择* *m 个* 变量对于*j* = 1 .. *m* ,如果*第 j* 个属性是分类属性 *IG{* }{ *}{\~}j{\~}* = IG( *Y* \| *X{* }{ *}{\~}j{\~}* )(参见信息增益)否则(*第 j* 个属性是实值) *IG{* }{ *}{\~}j{\~}* = IG *(* Y *\|* X{ *}{* }{\~}j{\~} *)(参见信息增益)让 \*j\** = argmax{\~}j\~ *IG{* }{ *}{\~}j{\~}* (这是我们将使用的分裂属性)如果*j\*是分类属性,则对于第* *j* 个属性的每个值*v* ,让 *X{* }{ *}{\^}v{\^} =* *X* 的行子集,其中*X{* }{ *}{\~}ij{\~}* = *v* 。令*Y{* }{ *}{\^}v{\^} =* *Y* 的对应子 集令*Child{* }{ *}{\^}v{\^}* = LearnUnprunedTree( *X{* }{ *}{\^}v{\^}* , *Y{* }{ *}{\^}v{\^}* ) 返回一个决策树节点,在*第 j* 个属性上进行拆分。子节点的数量等于第*j* 个属性的值的数量,第*v* 个子节点为 *Child{* }{ *}{\^}v{\^}* 否则*j\** 为实值,令*t* 为最佳拆分阈值令*X{* }{ *}{\^}LO{\^} =* *X* 的行子集,其中*X{* }{ *}{\~}ij{\~}* *\<= t* 。令*Y{* }{ *}{\^}LO{\^} =* *Y* 的对应子 集令*Child{* }{ *}{\^}LO{\^}* = LearnUnprunedTree( *X{* }{ *}{\^}LO{\^}* , *Y{* }{ *}{\^}LO{\^}* ) 令*X{* }{ *}{\^}HI{\^} =* *X* 的行子集 ,其中*X{* }{ *}{\~}ij{\~}* *\> t* 。令*Y{* }{ } *{\^}HI{\^} =* *Y* 的对应子集 令*Child{* }{ *}{\^}HI{\^}* = LearnUnprunedTree( *X{* }{ *}{\^}HI{\^}* , *Y{* }{ *}{\^}HI{\^}* ) 返回决策树节点,根据 *第 j* 个属性进行拆分。它有两个子节点,分别对应第*j*个属性是高于还是低于给定阈值。
*注意*:除了信息增益之外,还有其他方法可以分割节点
#### 信息增益
来源:\[3\](3.html)
1. h4. 名义属性
假设 X 可以具有 m 个值之一 V{\~}1\~,V{\~}2\~,...,V{\~}m\~ P(X=V{\~}1\~)=p{\~}1\~, P(X=V{\~}2\~)=p{\~}2\~,...,P(X=V{\~}m\~)=p{\~}m\~ H(X)= -sum{\~}j=1{\~}{\^}m\^ p{\~}j\~ log{\~}2\~ p{\~}j\~ (X 的熵) H(Y\|X=v) = 仅在 X 具有值 v 的记录中 Y 的熵 H(Y\|X) = sum{\~}j\~ p{\~}j\~ H(Y\|X=v{\~}j\~) IG(Y\|X) = H(Y) - H(Y\|X)
1. h4. 实值属性
假设 X 是实值,定义 IG(Y\|X:t) 为 H(Y) - H(Y\|X:t) 定义 H(Y\|X:t) = H(Y\|X\