stander random forest: random K features, enum all values as split, find best split.
LINKS:https://en.wikipedia.org/wiki/Random_forest
Extremely randomized trees: random K features, random a split value, find best split.
ensemble Extremely randomized trees: use all data.
LINKS:http://docs.opencv.org/2.4/modules/ml/doc/ertrees.html
- Extremely randomized trees don’t apply the bagging procedure to construct a set of the training samples for each tree. The same input training set is used to train all trees.
- Extremely randomized trees pick a node split very extremely (both a variable index and variable splitting value are chosen randomly), whereas Random Forest finds the best split (optimal one by variable index and variable splitting value) among random subset of variables.
Extremely randomized trees用了所有的样本作为训练集;Extremely randomized trees随机选一个特征和一个值作为分割标准;
LINKS:http://scikit-learn.org/stable/modules/generated/sklearn.tree.ExtraTreeRegressor.html#sklearn.tree.ExtraTreeRegressor
This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.
Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.
extra-trees 的ensemble用了bagging,然后选取多个特征,每个特征随机选一个值作为分割标准建树。
一种实现方法:
样本bagging, random n features & random k values ,求最优,建树。