Stochastic gradient descent

At the other extreme from Batch gradient descent, stochastic gradient estimation involves selecting observations at random, calculating the error from each of them, and using that error as a weak estimator of the gradient.

When using this for gradient descent, each of the updates carries a significant probability of being well off the mark. Collectively, though, they pull the parameters towards the target through the law of large numbers.

Typically, the observations are shuffled so that you still make a full pass over all of the observations. This ensures that you are still incorporating all of the information in the training set, but you are acting on it more often than with batch. When doing this, each pass through the data is called an epoch.

David's raw ML reference notes

Explorer

Stochastic gradient descent

Graph View

Backlinks