Softmax implementation

I want to reimplement Softmax so I can customize it.

I followed this post by ptrblck.

Two questions:
There is a lot of discussion about numeric stability (see here for example). Is this the case in the provided solution?
Why is it necessary to substract the max of x?

Thanks for you help!

this is exactly for numerical stability, exp(x) never overflows on a non-positive tensor x, exp(0)=1 (constant) is always included in the denominator, and underflow is more stable (worst case output is 0 instead of Infinity)

Note that softmax(x) = exp(x)/sum(exp(x)). If we have x = [1, 10, 1000, 10000, 10000000], exp(x) would be too large so that our computer may couldn’t store it (may return inf). After substracting the max of x, x is in the interval (-inf, 0] and exp(x) is in the interval (0, 1].