Mini-batching is a generalization of stochastic gradient estimation: you take observations for each update. For basic SGD, ; for batch gradient descent, . The main benefit of doing this is that you can leverage the parallelism capabilities of modern computing hardware, especially GPUs.

In practice, everyone calls this “stochastic gradient descent,” understanding that you can get the original meaning by using a batch size of 1.