After applying softmax to the final layer output, sort the vocabulary in descending probability order. Select candidates until the cumulative probability exceeds a threshold . Discard all other candidates and renormalize. This behaves like top-, but is much easier to tune.