Introduction
Backpropagation of errors (“backpropagation,” or even just “backprop”), is a method for updating the parameters of every unit in a neural network based on the error observed in the output layer. Rumelhart, Hinton, and Williams (1986) introduced (or at least popularized) this concept.
This propagation is made possible by the fact that the weighted sum of inputs
Rumelhart et al. make the argument for backpropagation based on a sigmoid activation function, but the same argument works for any differentiable activation function.
Rationale
Consider a neuron
Suppose we also have some error function
Given that we know the effect of a change in a weight
That is, if we have a way to obtain
Determining
We would like to know
To start, observe by the chain rule that
So all we need to do is obtain
We can obtain
If we choose a differentiable error function
Meanwhile, since
Therefore, we have
where
represents the gradient of the error with respect to the output of node ; represents the gradient of the activation function with respect to the weighted input; and is the output of the upstream node .
We’re done! We have everything we need to train every weight in our model.