Decision trees are trained by iteratively splitting data along a particular value for a particular feature, chosen to greedily minimize loss. As a result, certain splits may push |outliers to their own branches or leaves. By imposing a minimum number of samples per split or leaf, the model is less likely to isolate outliers, which has a regularizing effect. However, setting the value too high can prevent the model from discovering generalizable contours of the decision boundary, thereby inducing bias.

Whether to limit splits at the leaves or the splits comes down to how much irreducible error you expect in the data relative to the volume of training data available:

  • If your data is extremely noisy, then a substantial group may appear to split due only to chance, resulting in splits that will not generalize to a larger population. Setting a minimum count for splits will help with this, at risk of failing to discover a small-but-meaningful sources of variation. Note that bootstrap sampling can also help with this.
  • If your sample is small, small splits at the leaves could have a significant effect on training loss without generalizability to unseen data. (This is true even if the data are not particularly noisy.) In this case, setting a minimum count for leaves may be particularly beneficial.