How is softmax implemented in pytorch? The time complexity of the softmax function dependes on the number of classes O(N)
. However, in my task, the number of classes are huge (in the order of hundred thousand). Should I go with the builtin softmax function in pytorch? If not, what are the other options?
I would check, if this method is the bottleneck for your current approach and then check alternatives.
While a large number of classes might slow down the method, it isn’t necessarily the bottleneck, so your optimizations might not help.