I am curious how altering the temperature of softmax affects differentiability. Technically, softmax is always differentiable, but due to numerical precision… how many digits E-N will still permit gradient flow through softmax?
1 Like