What is the difference between log_softmax and softmax?

drscotthawley · August 2, 2020, 10:35pm

@KaiyangZhou’s answer may have been correct once, but does not match the current documentation, which reads:

“While mathematically equivalent to log(softmax(x)), doing these two
operations separately is slower, and numerically unstable. This function
uses an alternative formulation to compute the output and gradient correctly.”

And unfortunately the linked-to source for log_softmax merely includes a call to another .log_softmax() method which is defined somewhere else, but I have been unable to find it, even after running grep -r 'def log_softmax * on the pytorch directory.

EDIT: Regarding the source, Similar post: “Understanding code organization: where is `log_softmax` really implemented?”, answered by @ptrblck as pointing to the source code here: https://github.com/pytorch/pytorch/blob/420b37f3c67950ed93cd8aa7a12e673fcfc5567b/aten/src/ATen/native/SoftMax.cpp#L146 …And yet all that does is call still-other functions log_softmax_lastdim_kernel() or host_softmax. Still trying to find where the actual implementation is, not just calls-to-calls-to-calls.