In the context of autoregressive language models, beam search can be used to sample a sequence of tokens conditioned on the tokens before them. In this context,
- The root of the search tree is the start-of-sequence token;
- The set of successor states is the predicted probability distribution across the vocabulary conditioned on the preceding tokens; and
- The global score of a sequence of tokens is the product of all token selections in a candidate sequence (or, more commonly, the sum of the log probabilities).
As a defense against infinite sequences, one will prune any candidates that reach a maximum length without producing the end-of-sequence token.
There are several possible termination conditions, which can be combined:
- Terminate when all of the candidates contain the end-of-sequence token
- Terminate when at least one candidate containing the end-of-sequence token exceeds a certain score
- Terminate when the top scores cease to increase significantly after several iterations
- Terminate if the top sequences are too similar based on some measure of sequence diversity