Get NaN in nn.Softmax when the input is created by -np.inf

As the title suggests, I created a tensor by a = torch.zeros((3, 4)).fill_(-np.inf). Theretically, every element of a is a super small negative value, and nn.Softmax(a) should produce near zero output. However, the output is NaN.Is there something I missed or misunderstood? Any help is appreciated!
QQ截图20211208232008

Currently, I’m training a Transformer, and the loss is NaN after the const value in the mask is set to -np.inf, but it works well with the const being -1e20

i think using torch.nn.LogSoftmax might work for you.
LogSoftmax is numerically more stable, but in your test code still return nan.

Hello,

From my understanding of Softmax, it’s quite inconsistent to have all your logits = -inf, you should have at least one logit which value is superior to -inf. Ex:

>>>
>>> a = torch.zeros(4)
>>> a[0:3] = float('-inf')
>>> a
tensor([-inf, -inf, -inf, 0.])
>>> torch.nn.functional.softmax(a)
__main__:1: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
tensor([0., 0., 0., 1.])
>>> 
>>> 
>>> a[0:4] = float('-inf')
>>> torch.nn.functional.softmax(a)
tensor([nan, nan, nan, nan])

Also, about your statement:

Theretically, every element of a is a super small negative value, and nn.Softmax(a) should produce near zero output.

It is true that the probability value in the output of a softmax for a logit that tend to -inf should tend to 0. But your output probability distribution cannot all be close to 0 evrywhere, as the sum of all probabilities should sum to 1.

Thanks for your reply. Sadly it doesn’t work either. This question doesn’t bother me to write a Transformer because I set the const in the mask to -1e20 and it works pretty well. I just want to know why -np.inf fails. According to some reference codes, it seems that -np.inf once worked. I guess there’s some version change which leads to this pitfall. weird thing.