Technically, the definitions of these metrics depend implicitly on a classification as “relevant” or “non-relevant.” However, if a ground-truth ranking exists, then the top items may be treated as the ground-truth relevant set. In many modern applications, though, an unbiased ground-truth ranking is frequently impracticable.
Recall at (r@k)
Let be the total number of items with a ground-truth label of “relevant.” Recall that
Then r@k is the fraction of represented among the highest-ranked items, given as
If we choose a value of , then clearly the maximum value for r@k is for that . For most applications, . Hence, for many applications, the dynamic range of r@k is too small to be useful on its own.
Precision at (p@k)
Recall that
Then p@k is the proportion of items, among the -highest ranked, that have a ground-truth label of “relevant,” given as
When we are treating the top as our “positive” predictions, then p@k reduces to the fraction of the first rankings that are true positives:
As with r@k, it also does not directly capture information about the relative ordering of the top items.
Average precision
Although p@k and (especially) r@k have somewhat limited applicability, they can be combined into an order-aware metric called average precision, which is often much more useful.