As the title suggests, I created a tensor by `a = torch.zeros((3, 4)).fill_(-np.inf)`

. Theretically, every element of a is a super small negative value, and `nn.Softmax(a)`

should produce near zero output. However, the output is NaN.Is there something I missed or misunderstood? Any help is appreciated!

Currently, I’m training a Transformer, and the loss is NaN after the const value in the mask is set to -np.inf, but it works well with the const being -1e20

i think using `torch.nn.LogSoftmax`

might work for you.

LogSoftmax is numerically more stable, but in your test code still return `nan`

.

Hello,

From my understanding of Softmax, it’s quite inconsistent to have all your logits = -inf, you should have at least one logit which value is superior to -inf. Ex:

```
>>>
>>> a = torch.zeros(4)
>>> a[0:3] = float('-inf')
>>> a
tensor([-inf, -inf, -inf, 0.])
>>> torch.nn.functional.softmax(a)
__main__:1: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
tensor([0., 0., 0., 1.])
>>>
>>>
>>> a[0:4] = float('-inf')
>>> torch.nn.functional.softmax(a)
tensor([nan, nan, nan, nan])
```

Also, about your statement:

Theretically, every element of a is a super small negative value, and

`nn.Softmax(a)`

should produce near zero output.

It is true that the probability value in the output of a softmax for a logit that tend to -inf should tend to 0. But your output probability distribution cannot all be close to 0 evrywhere, as the sum of all probabilities should sum to 1.

Thanks for your reply. Sadly it doesn’t work either. This question doesn’t bother me to write a Transformer because I set the const in the mask to -1e20 and it works pretty well. I just want to know why `-np.inf`

fails. According to some reference codes, it seems that -np.inf once worked. I guess there’s some version change which leads to this pitfall. weird thing.