Hinge loss (also known as SVM loss) is a loss function that penalizes classifications within some margin
Let
It can be shown that, for multiclass classification, this is equivalent to
where
For hinge/SVM loss, the loss is zero when the correct label’s logit exceeds the incorrect labels’ logits by at least 1. When this condition is not met, it contributes to the loss. Such a loss is zero in “easy” cases (those classified beyond the margin). As such, the gradient of the loss is zeroed out for all of these cases as well.
This means that all parameter adjustments are based on cases that are either misclassified or classified with low confidence (within the margin). This may be a good thing or a bad thing, but it’s usually bad.
Situations where it’s potentially useful:
- When you really need strong separation of classes
- When you have relatively little training data