In a generalized attention model consisting of a key-value pair and a query, a context vector is a weighted mean of the value vectors . The weight is determined by a measure of relevance between the query vector and the key vectors corresponding to each value vector :
where is the length of the input sequence and is a scalar-valued function whose values sum to 1. Note that in early formulations of attention, .
As the value vectors represent a set of features that are latent in the input, such a mean represents the expected value of the feature vector for the -th position in the input sequence.