Triplet loss is a contrastive loss function. Owing to its generality and empirical success, it is widespread in the domain of ranking tasks. It can also be used for alignment of embedding models, e.g. for multimodal models or two-tower collaborative filtering.

The loss is given as

where

  • is the distance between and ,
  • is the anchor example’s prediction,
  • is the negative example’s prediction,
  • is the positive example’s prediction, and
  • is a tunable margin between positive and negative examples.

Notice the similarity between the definition of the triplet loss and that of the hinge loss for a single example pair and :

Triplet loss can be seen as an extension of hinge loss where labels are replaced with distances. This helps to understand its behavior as tending to create a distance of at least between positive and negative examples.

Hinge loss can be favorable in situations where training data is sparse, which is certainly the case with contrastive loss: the set of all possible training examples is the Cartesian product of all instances with all other instances, and we are sampling a potentially infinitesimal fraction of this set.

The distance metric is situational. Often we choose Euclidean distance, but e.g. for text we might choose cosine similarity.