However, after implementation I found that the results are not as good as the original one (F.softmax). So I am here to ask what is the difference between my implementation and the built-in function.
The output from your own_softmax is slightly different from torch.nn.functional.softmax .
This may be the reason why your own_softmax degrades the performance.
x = torch.randn(2,10)
h_own = own_softmax(x)
h = torch.nn.functional.softmax(x, 1)
print(h - h_own)
Thank you so much. It seems that different centralization method for the network output score influence the softmax output a lot. I have tested the centralization using max value and mean value, and their output are quite different. I am wondering whether the mean one is more stable?
Hi Junwu!
For some unique optimization I need to re-implement softmax.
I am afraid to see accuracy degradation. Do you see performance degradation in training or do you use your softmax for inference only? I think training is more sensitive to numerical problems.