The softmax function is a generalization of the logistic sigmoid function for multi-class classification problems. It takes a vector of real numbers as input and normalizes it into a probability distribution, where each element is in the range (0, 1) and the sum of all elements is equal to 1.
From Claude AI:
Given an input vector
Here’s what the softmax function does:
- It exponentiates each element
of the input vector, which maps the real numbers to positive values. - It normalizes the exponentiated values by dividing each one by the sum of all exponentiated values. This ensures that the output values sum to 1.
The output of the softmax function can be interpreted as a probability distribution over the K classes.