Softmax implementation

HmmRfa · April 13, 2021, 2:21pm

I want to reimplement Softmax so I can customize it.

I followed this post by ptrblck.

Two questions:
There is a lot of discussion about numeric stability (see here for example). Is this the case in the provided solution?
Why is it necessary to substract the max of x?

Thanks for you help!

googlebot · April 13, 2021, 8:43pm

this is exactly for numerical stability, exp(x) never overflows on a non-positive tensor x, exp(0)=1 (constant) is always included in the denominator, and underflow is more stable (worst case output is 0 instead of Infinity)

Eta_C · April 14, 2021, 1:18pm

Note that softmax(x) = exp(x)/sum(exp(x)). If we have x = [1, 10, 1000, 10000, 10000000], exp(x) would be too large so that our computer may couldn’t store it (may return inf). After substracting the max of x, x is in the interval (-inf, 0] and exp(x) is in the interval (0, 1].