Next-sentence prediction (NSP) is a self-supervised learning task employed by certain language models, most notably BERT.

Recall that BERT’s input sequences consist of one or two “sentences” (actually arbitrary-length sequences of contiguous text). In NSP, the task of the model is to predict whether these two sentences are contiguous.

Training examples are constructed through a form of negative sampling: in the negative examples, the second sentence is replaced by a random sentence in the corpus.

NSP helps the model learn about sentence continuity, which can indirectly support learning about entailment, causality, and order.