Average precision (AP) is an evaluation metric for ranking tasks. It combines precision and recall at k (p@k and r@k) in a way that provides information on how closely the predicted rank ordering tracks relevance. The average precision provides a single value associating precision and recall at all possible values of . In this sense, AP is used for ranking problems in a manner analogous to how AUROC is used for binary classification.

  • implies that the model only returns relevant results, so that precision is perfect at all values of .
  • implies that the model never returns relevant results, so that precision is zero at all values of .

Derivation

To derive average precision, we start by definiting precision and recall at a specific rank . Let be the the number of relevant documents in the top documents. Then r@k can be defined as

where is the total number of relevant documents, and p@k can be defined as

TO DO: THE REST OF THIS SECTION DOESN’T MAKE SENSE TO ME Now we can express precision in terms of as

Notice that, since is discrete-valued, there is a single well-defined value for every . Hence we can finally define

which we can just write as Notice that both and can take non-zero values only on the interval . Hence we can take the area under the curve as the average precision over the recall:

Discrete approximation

In practice, we have only true cases, where is a finite integer greater than zero. Therefore, the integral over with respect to must be approximated by a sum over these discrete instances. Recall that is an alias for when evaluating and at the same . Hence we can write our approximation as

Since and must increase with discrete steps of , we know that

Hence we can express average precision as