Prior to applying softmax to the final layer output, divide the logits by a constant . When , this is equivalent to greedy decoding; when , this is equivalent to sampling from a uniform distribution. This strategy is used in APIs for most commercial LLMs, since it is easy to reason about.