Internal implementation of softmax gradient calculation

Hi, Can anyone please give a pointer from where I can get internal implementation? I have implemented my own backward for experimentation purpose but its gradient values goes to NaNs after some rounds.


You normally would want to shift softmax inputs by a constant (max(x)), so that vector maximums are 0, for numerical stability. As to your question - aten/src/ATen/native/cpu/SoftMaxKernel.cpp