Currently you need to specify softmax dimension as dim. This is nice, and it will complain if you don’t set the dim. Why are the dimensions -1 and -2 used for?
I expected when dim=-1 this should do softmax on whole tensor. Is this possible?
Would be interesting to know the same quest for the log_softmax also?