Focal loss is an extension of cross-entropy loss that “focuses” the distribution on misclassified observations.
Recall that 02 Binary cross-entropy loss was defined in terms of the predicted label
Notice that one of these terms will be zero in every case, since the ground truth
If we define
then we can write the binary cross-entropy loss as
Notice that the left factor
This can be generalized to multi-class classification as the sum