Language models represent the input sequence (and often the output sequence) as indices into a vector of known tokens called a vocabulary. Often, there are important positional concepts that are not captured as explicit words, such as the end of the sequence. In these cases, the concept is encoded as a special token in the vocabulary. Examples of common special tokens include:
<PAD>: indicates a meaning-free token that was added only to facilitate computation.<START>or<BOS>: indicates the beginning of a sequence.<STOP>or<EOS>: indicates the end of a sequence.<UNK>: indicates an unknown token.