Focal loss is an extension of cross-entropy loss that “focuses” the distribution on misclassified observations.

Recall that 02 Binary cross-entropy loss was defined in terms of the predicted label and the ground truth as

Notice that one of these terms will be zero in every case, since the ground truth . So we can write this piecewise as

If we define as

then we can write the binary cross-entropy loss as

Notice that the left factor is a measure of the deviation of the prediction from the ground truth. Exponentiating this factor by a parameter will shrink this term, but values close to zero will shrink much faster than values close to 1. As such, we can “focus” the loss on incorrect labels. The common formulation is

This can be generalized to multi-class classification as the sum