Next-word prediction

Next-word prediction is a task in natural language processing. While it has gained newfound prominence due to its role in generative pre-training, it dates back to Claude Shannon’s seminal papers on information theory. It is defined in terms of a likelihood of a corpus

where

is the set of all tokens;
is the context length; and
is a parameter vector.

This can be expressed as a log loss :

Note that this representation is independent of the model architecture that is used to predict the probability of .

David's raw ML reference notes

Explorer

Next-word prediction

Graph View

Backlinks