Hinge loss (also known as SVM loss) is a loss function that penalizes classifications within some margin , but otherwise treats the loss as zero. Unlike most other loss functions used in deep learning, you start with the raw scores (logits), i.e., before using softmax. It is so named because, when you plot the loss as a function of the error, it looks like a hinged door.

Let be our ground truth label, and be our raw logit score. Then the hinge loss is

It can be shown that, for multiclass classification, this is equivalent to

where is the index of the ground truth label, and is a vector of predicted logits.

For hinge/SVM loss, the loss is zero when the correct label’s logit exceeds the incorrect labels’ logits by at least 1. When this condition is not met, it contributes to the loss. Such a loss is zero in “easy” cases (those classified beyond the margin). As such, the gradient of the loss is zeroed out for all of these cases as well.

This means that all parameter adjustments are based on cases that are either misclassified or classified with low confidence (within the margin). This may be a good thing or a bad thing, but it’s usually bad.

Situations where it’s potentially useful:

  • When you really need strong separation of classes
  • When you have relatively little training data