For tree-based ensemble models, maximum base learner tree depth is a tunable hyperparameter to prevent overfitting. As such, it is considered a form of regularization.

Although trees are non-parametric, splits represent a degree of freedom and an opportunity to learn a nonlinearity. Hence excessive splits allow the model to memorize the training data. Sequential ensembles typically limit tree depth, as their adaptive nature makes them particularly susceptible to memorization. Gradient boosting models, which use a relatively small number of trees, typically limit tree depth to 3-8; AdaBoost, which uses a large number of trees, usually limits tree depth to 1 (“decision stumps”).

In parallel ensembles that employ feature sampling (such as Random Forest), the risk of overfitting due to excessive splits is lower. Instead, these models depend on thoroughly modeling the limited data provided to each tree, and then integrating these specialized learners through weighted averaging. Hence these models often do not limit tree depth at all; when they do, it is typically a fairly high limit (e.g., 10-30 splits).