If you look at example 1 below, it is a 4x4 matmul z = torch.matmul(x, y), and it is a situation where nan should be output by accumulating +inf and inf.
But matrix output (z matrix) are all composed of +/inf.
example 1)
x = tensor([[ 1.2208e+38, 1.5069e+38, 3.4027e+37, 7.5471e+37],
[ 6.7193e+37, -9.0691e+37, -1.7984e+38, 1.5571e+38],
[-1.9089e+38, -1.7563e+37, 2.3544e+37, -9.8721e+37],
[-9.5343e+37, -5.0244e+37, -7.0133e+37, 2.2247e+37]])
y = Parameter containing:
tensor([[-9.2652e+36, -5.3796e+37, -7.5069e+37, 6.6170e+37],
[ 6.2397e+37, 1.1426e+38, -1.6466e+38, 2.9774e+38],
[-3.9087e+37, -5.1765e+36, 5.3396e+37, -2.4418e+38],
[-6.0303e+37, 7.8149e+37, 4.1132e+37, 1.4304e+37]],
requires_grad=True)
z = tensor([[-inf, -inf, -inf, inf],
[-inf, -inf, -inf, inf],
[inf, inf, inf, -inf],
[inf, inf, inf, -inf]], grad_fn=)
If you look at example 2 below, you can see that nan is output only when +/-inf is directly entered as input.
example 2)
x = tensor([[inf, -inf, inf, -inf],
[-inf, inf, inf, -inf],
[-inf, -inf, -inf, inf],
[inf, -inf, inf, inf]])
y = Parameter containing:
tensor([[inf, -inf, -inf, inf],
[-inf, -inf, -inf, inf],
[-inf, -inf, -inf, inf],
[-inf, inf, -inf, inf]], requires_grad=True)
z = tensor([[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, inf, nan, nan],
[nan, nan, nan, nan]], grad_fn=)
This result is different from the numerical operations we know.
I would like to hear the opinion of who know about that problem.