Hello!
I was wondering if anyone knows if the implementation of F.softmax is numerically stable: that is, does it implement the exp-normalise trick (see here?
Hello!
I was wondering if anyone knows if the implementation of F.softmax is numerically stable: that is, does it implement the exp-normalise trick (see here?