Modern retrieval systems often operate on millions or even billions of candidates. In these situations, multiple ranking passes are often necessary to achieve good results with satisfactory computational cost. There are multiple kinds of passes.

Ranking cascades

As with all machine learning models, information retrieval tasks must balance statistical performance with latency. In large-scale retrieval systems such as those used by major companies, the set of candidates can easily measure in the billions. Hence, it is necessary to iteratively narrow down the number of candidates with increasingly precise-but-expensive methods. This is the essence of cascade ranking.

The first pass is typically a technique that can be computed in sub-linear time:

For recommendations, the first pass might involve collaborative filtering using matrix factorization. This process is inexpensive, and results in embeddings for each user and each product in an aligned vector space. These embeddings can be stored and queried using a low-precision strategy like defeatist search.
For full-text search, this might involve using an inverted index to filter by keywords or score for keyword relevance using TF-IDF.

Typically, the next step would be to prune the resulting candidates before re-ranking them. Since the re-ranker is often computationally expensive, the goal is to prune the list using a strategy that is relatively cheap and relatively recall-focused. The re-ranker then scores and sorts the remaining candidates, and the top are retained. For very large corpora, there may be multiple prune/re-rank cycles.

The term “cascade ranking system” appears to have first emerged in Wang, Lin, and Metzler (2011), but the concept is much older. Relational databases have employed sequential filtering and ranking operations since the 1970s. Likewise, full-text search engines have supported these kinds of operations basically since their inception.

Post-rank injection

In many cases, it is also desirable to inject candidates after a stage. This is done for a variety of reasons, such as for exploration, advertisement, or diversity. Depending on the use case, this may be done before or after fully scoring the candidate set.

Ensembling

It can sometimes be beneficial to build multiple independent ranking models, then apply some strategy to mix their results. This strategy may in fact be another learning-to-rank algorithm, in which case the set of models can be viewed as forming a sequential ensemble.

Combining multiple models can be particularly useful if a company is experimenting to determine what factors drive certain business metrics. In this case, the company can employ a multi-arm bandits approach to continuously refine the optimal mix of factors, including the introduction of new models. This approach makes it easy to explore the effectiveness of new mechanisms for serendipity.

David's raw ML reference notes

Explorer

Multi-stage ranking systems

Ranking cascades

Post-rank injection

Ensembling

Graph View

Table of Contents

Backlinks