There are three kinds of learning-to-rank models: pointwise, pairwise, and listwise.

Pointwise learning-to-rank

The model learns to predict some score or metric for an example. As a clear illustrative (if not exactly common) example, consider a model that predicts how many stars a user will give a product. This star rating is implicitly relative to the rest of the population, but the model is not trained to make any explicit comparisons.

A more typical example would be a prediction of the user’s probability of advancing a product in a sales funnel. Such a model might well be sufficient to select a ranked set of top candidates for recommendation, but those candidates were in effect selected independently.

Pointwise L2R models typically employ binary cross-entropy loss (for classification) or MSE loss (for regression).

Pairwise learning-to-rank

The model is trained with paired examples: one positive, one negative. The objective is to learn a distance metric for a latent space such that relevant examples are closer to the query than the irrelevant ones. Pairwise learning-to-rank can be very useful in situations where the sample space is sparse, and especially if examples exist in distant, non-overlapping clusters. Examples include web search or product recommendations on a general-goods platform like Amazon.

Pairwise L2R models are a form of contrastive learning. Common loss functions include hinge loss and triplet loss. Their spatial properties make them an excellent for training embedding models that will be used for vector search.

Listwise learning-to-rank

The model is trained to directly optimize a ranking metric such as NDCG. This requires specialized, metric-specific loss functions like ListMLE loss and its practice is outside the scope of my knowledge. My understanding is that it can be very effective in situations where relevance is dependent on what’s in the rest of the list. For example, neither pointwise nor pairwise L2R could directly handle redundancy or introduce diversity. That said, there are established ways for dealing with both of these problems that don’t resort to specialized models: multi-stage rankers routinely handle both.

For companies with the scale (or niche) to justify a specialized approach, listwise L2R could provide a powerful boost to performance. This boost comes at the cost of a model with many assumptions baked into its training. In most cases, you will be better served by creating smaller, more modular retrieval models and combining them through re-ranking and continuous experimentation.