why softmax using odds and logits?


I’d like to ask you a question about softmax today. As far as I know, softmax converts the inferred values into probability values of sum 1.

Many of the data explaining softmax often explain only the function of the equation (conversion to probability), and some of the data interpret the output value of the shear transmitted to softmax as a logit. As I try to understand softmax with the concept of logit, I come to think of many things.

If you look at the softmax equation based on the picture below, exp(y) is used. If y is logit and logit is a value that takes a natural logarithm to odds, exp(y) becomes odds. Then, softmax is a value obtained by taking exp to logit and converting it into odds, dividing each odds by the sum of odds and converting it into probabilities. But I don’t understand why the concept of odds was used.

So this is how I understood it. Although logit converted odds into -inf, +inf values, we focused on values in the -inf, +inf range rather than correlating with odds and simply meant raw value of linear output, widening the gap between class-specific inference values and making differential expressions easier. Is it appropriate to interpret it like this?
(참고 tensorflow.org에서의 logits의 정의 : Per-label activations, typically a linear output. These activation energies are interpreted as unnormalized log probabilities.)

Thank you for reading it, and I ask for many people’s answers/opinions.