Precision and recall at (“p@k” and “r@k,” respectively) are performance metrics for ranking tasks. Traditional precision and recall are classification metrics. p@k and r@k represent adaptations of these metrics to a context in which items are ordered rather than labeled.

Technically, the definitions of these metrics depend implicitly on a classification as “relevant” or “non-relevant.” However, if a ground-truth ranking exists, then the top items may be treated as the ground-truth relevant set. In many modern applications, though, an unbiased ground-truth ranking is frequently impracticable.

Recall at (r@k)

Let be the total number of items with a ground-truth label of “relevant.” Recall that

Then r@k is the fraction of represented among the highest-ranked items, given as

If we choose a value of , then clearly the maximum value for r@k is for that . For most applications, . Hence, for many applications, the dynamic range of r@k is too small to be useful on its own.

Precision at (p@k)

Recall that

Then p@k is the proportion of items, among the -highest ranked, that have a ground-truth label of “relevant,” given as

When we are treating the top as our “positive” predictions, then p@k reduces to the fraction of the first rankings that are true positives:

As with r@k, it also does not directly capture information about the relative ordering of the top items.

Average precision

Although p@k and (especially) r@k have somewhat limited applicability, they can be combined into an order-aware metric called average precision, which is often much more useful.