scikit learn - Understanding max_features parameter in RandomForestRegressor -

- July 15, 2014

while constructing each tree in random forest using bootstrapped samples, each terminal node, select m variables @ random p variables find best split (p total number of features in data). questions (for randomforestregressor) are:

1) max_features correspond (m or p or else)?

2) m variables selected @ random max_features variables (what value of m)?

3) if max_features corresponds m, why want set equal p regression (the default)? randomness setting (i.e., how different bagging)?

thanks.

straight documentation:

[max_features] size of random subsets of features consider when splitting node.

so max_features call m. when max_features="auto", m = p , no feature subset selection performed in trees, "random forest" bagged ensemble of ordinary regression trees. docs go on that

empirical default values max_features=n_features regression problems, , max_features=sqrt(n_features) classification tasks

by setting max_features differently, you'll "true" random forest.

Search This Blog

DTr

scikit learn - Understanding max_features parameter in RandomForestRegressor -

Comments

Post a Comment

Popular posts from this blog

php - render data via PDO::FETCH_FUNC vs loop -

c++ - OpenCV Error: Assertion failed <scn == 3 ::scn == 4> in unknown function, -

The canvas has been tainted by cross-origin data in chrome only -