Get NaN in nn.Softmax when the input is created by -np.inf

Rancho_Xia · December 8, 2021, 3:20pm

As the title suggests, I created a tensor by a = torch.zeros((3, 4)).fill_(-np.inf). Theretically, every element of a is a super small negative value, and nn.Softmax(a) should produce near zero output. However, the output is NaN.Is there something I missed or misunderstood? Any help is appreciated!
QQ截图20211208232008

Rancho_Xia · December 8, 2021, 3:21pm

Currently, I’m training a Transformer, and the loss is NaN after the const value in the mask is set to -np.inf, but it works well with the const being -1e20

mMagmer · December 8, 2021, 3:36pm

i think using torch.nn.LogSoftmax might work for you.
LogSoftmax is numerically more stable, but in your test code still return nan.

Azerus · December 8, 2021, 3:58pm

Hello,

From my understanding of Softmax, it’s quite inconsistent to have all your logits = -inf, you should have at least one logit which value is superior to -inf. Ex:

>>>
>>> a = torch.zeros(4)
>>> a[0:3] = float('-inf')
>>> a
tensor([-inf, -inf, -inf, 0.])
>>> torch.nn.functional.softmax(a)
__main__:1: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
tensor([0., 0., 0., 1.])
>>> 
>>> 
>>> a[0:4] = float('-inf')
>>> torch.nn.functional.softmax(a)
tensor([nan, nan, nan, nan])

Also, about your statement:

Theretically, every element of a is a super small negative value, and nn.Softmax(a) should produce near zero output.

It is true that the probability value in the output of a softmax for a logit that tend to -inf should tend to 0. But your output probability distribution cannot all be close to 0 evrywhere, as the sum of all probabilities should sum to 1.

Rancho_Xia · December 9, 2021, 9:59am

Thanks for your reply. Sadly it doesn’t work either. This question doesn’t bother me to write a Transformer because I set the const in the mask to -1e20 and it works pretty well. I just want to know why -np.inf fails. According to some reference codes, it seems that -np.inf once worked. I guess there’s some version change which leads to this pitfall. weird thing.