In the context of autoregressive language models, beam search can be used to sample a sequence of tokens conditioned on the tokens before them. In this context,

  • The root of the search tree is the start-of-sequence token;
  • The set of successor states is the predicted probability distribution across the vocabulary conditioned on the preceding tokens; and
  • The global score of a sequence of tokens is the product of all token selections in a candidate sequence (or, more commonly, the sum of the log probabilities).

As a defense against infinite sequences, one will prune any candidates that reach a maximum length without producing the end-of-sequence token.

There are several possible termination conditions, which can be combined:

  • Terminate when all of the candidates contain the end-of-sequence token
  • Terminate when at least one candidate containing the end-of-sequence token exceeds a certain score
  • Terminate when the top scores cease to increase significantly after several iterations
  • Terminate if the top sequences are too similar based on some measure of sequence diversity