However, after implementation I found that the results are not as good as the original one (F.softmax). So I am here to ask what is the difference between my implementation and the built-in function.
The output from your own_softmax is slightly different from torch.nn.functional.softmax .
This may be the reason why your own_softmax degrades the performance.
x = torch.randn(2,10)
h_own = own_softmax(x)
h = torch.nn.functional.softmax(x, 1)
print(h - h_own)
Thank you so much. It seems that different centralization method for the network output score influence the softmax output a lot. I have tested the centralization using max value and mean value, and their output are quite different. I am wondering whether the mean one is more stable?