scikit learn - Understanding max_features parameter in RandomForestRegressor -
while constructing each tree in random forest using bootstrapped samples, each terminal node, select m variables @ random p variables find best split (p total number of features in data). questions (for randomforestregressor) are:
1) max_features correspond (m or p or else)?
2) m variables selected @ random max_features variables (what value of m)?
3) if max_features corresponds m, why want set equal p regression (the default)? randomness setting (i.e., how different bagging)?
thanks.
straight documentation:
[
max_features
] size of random subsets of features consider when splitting node.
so max_features
call m. when max_features="auto"
, m = p , no feature subset selection performed in trees, "random forest" bagged ensemble of ordinary regression trees. docs go on that
empirical default values
max_features=n_features
regression problems, ,max_features=sqrt(n_features)
classification tasks
by setting max_features
differently, you'll "true" random forest.
Comments
Post a Comment