scikit learn - Understanding max_features parameter in RandomForestRegressor -
while constructing each tree in random forest using bootstrapped samples, each terminal node, select m variables @ random p variables find best split (p total number of features in data). questions (for randomforestregressor) are:
1) max_features correspond (m or p or else)?
2) m variables selected @ random max_features variables (what value of m)?
3) if max_features corresponds m, why want set equal p regression (the default)? randomness setting (i.e., how different bagging)?
thanks.
straight documentation:
[
max_features] size of random subsets of features consider when splitting node.
so max_features call m. when max_features="auto", m = p , no feature subset selection performed in trees, "random forest" bagged ensemble of ordinary regression trees. docs go on that
empirical default values
max_features=n_featuresregression problems, ,max_features=sqrt(n_features)classification tasks
by setting max_features differently, you'll "true" random forest.
Comments
Post a Comment