The learning rate, typically denoted
The learning rate is primarily concerned with the resolution of the gradient descent process. At each iteration, gradient descent takes a step opposite the direction of the gradient
The learning rate also plays an indirect role in regularization, albeit through competing tendencies. On the one hand, steep minima that cover a very small area of the gradient manifold often correspond to details of the training data, especially when surrounded by zones of much higher losses. Hence a low learning rate can reduce the likelihood that the optimizer can reach such a minimum, and hence reduces the risk of overfitting. On the other hand, high learning rates can help the optimizer escape local minima, reducing the risk of underfitting. Learning rate scheduling can help with this, in addition to its primary role of improving convergence.
Gradient boosting fits a regression model to the residuals, which in effect resembles the action of gradient descent while making much more informed and adaptive improvements to the overall predictor. In this context, the term “learning rate” refers to a shrinkage factor for each component in the ensemble. That is, “learning rate” defines the magnitude of the contribution from each regressor.
Folder overview
id: 4bc18ead-ef27-463e-bc2e-5290c5855d2a
folderPath: ""
title: "{{folderName}} overview"
showTitle: false
depth: 3
includeTypes:
- folder
- markdown
style: list
disableFileTag: false
sortBy: name
sortByAsc: true
showEmptyFolders: false
onlyIncludeSubfolders: false
storeFolderCondition: true
showFolderNotes: false
disableCollapseIcon: true