Feature subsampling is a strategy for tree-based ensembles, and the characteristic that distinguishes random forest from bagging. It involves selecting a subset of features, typically without replacement, to use at each split in the tree. That is, each base learner has access to the entire set of features, but it chooses a subset of features to consider for each split. This behavior is different from these models’ use of bootstramp sampling, which occurs at the level of the entire base learner.
As parallel ensembles average across their base learners, this strategy reduces model dependence on any one parameter. This allows individual learners to discover the effects of features that may have otherwise been dominated by the largest contributors to variation. It also has a regularizing effect, as it reduces the impact of any one feature on the model’s behavior overall.
In the extreme limit where only one feature may be considered, the model will sample each feature with equal probability. Hence, for a large number of trees, the model will treat all features with approximately the same weight. At the other extreme limit, where all features may be considered, the model becomes ordinary bagging. Hence, the subsample fraction determines how much flexibility the model has in determining feature importance.
Feature subsampling can be applied to tree-based sequential ensembles as well, where it has a similar effect, provided proportionally more base learners are employed.
In principle, this same technique could be used on any base learner, not just decision trees. However, decision trees are uniquely suited to this sort of sampling because each split can depend on only a single feature. Hence the only effect of providing a subset of features is to restrict which features the model can consider for splits. By comparison, feature sampling is likely to have a more systemic effect on other classes of base learner. This fact has made random forest distinctive enough to merit its own name.