Information retrieval is the selection of candidate items from a larger set of items. Examples include exact or approximate similarity search (as from a full-text or vector store) and ranking (as with recommendations).

Information retrieval techniques can be roughly broken down into two large classes, though considerable overlap exists:

  • index-based information retrieval, in which some distilled representation of each candidate item is precomputed and subsequently queried; and
  • learning-to-rank, in which a prediction is made from a statistical model each time a new query is given.

In practice, the two approaches are often combined sequentially (as with a cascade ranking system). Additionally, some procedures inherently depend on both processes, as when ranking by selecting the nearest neighbors to a learned embedding.