r1 = torch.randn(5)
r2 = torch.tensor([-float('inf') for k in range(5)])
dp = 0.
for i in range(r1.shape[0]):
dp+=(r1[i]*r2[i])
dp
Output
tensor(nan)
So it seems adding -inf multiple times gives a nan. This creates a problem while implementing masked attention.
Is there a way in which adding -inf multiple times still gives me -inf ? I am not looking for a hack involving some conditional expression. Perhaps, I am looking for a solution which makes -inf idempotent with respect to addition.
It is likely that not all of the elements of r1 have the same sign.
Try replacing your original first line with:
r1 = torch.abs (torch.randn(5))
Let’s say that r1[0] is positive. Then after the first iteration, dp will be
equal to -inf. If r1[1] happens to be negative, in the second iteration
you will be calculating (-inf) + inf = nan, as it should be. (And once dp becomes nan, it stays nan.)
Note, if you run your test many times (with different random values
for r1, you will occasionally get -inf and inf for the result, instead
of nan.
The short answer is that you are not adding -inf multiple times.
You are usually adding -inf to inf somewhere in your loop.