If you look at example 1 below, it is a 4x4 matmul z = torch.matmul(x, y), and it is a situation where nan should be output by accumulating +inf and inf.

But matrix output (z matrix) are all composed of +/inf.

example 1)

x = tensor([[ 1.2208e+38, 1.5069e+38, 3.4027e+37, 7.5471e+37],

[ 6.7193e+37, -9.0691e+37, -1.7984e+38, 1.5571e+38],

[-1.9089e+38, -1.7563e+37, 2.3544e+37, -9.8721e+37],

[-9.5343e+37, -5.0244e+37, -7.0133e+37, 2.2247e+37]])

y = Parameter containing:

tensor([[-9.2652e+36, -5.3796e+37, -7.5069e+37, 6.6170e+37],

[ 6.2397e+37, 1.1426e+38, -1.6466e+38, 2.9774e+38],

[-3.9087e+37, -5.1765e+36, 5.3396e+37, -2.4418e+38],

[-6.0303e+37, 7.8149e+37, 4.1132e+37, 1.4304e+37]],

requires_grad=True)

z = tensor([[-inf, -inf, -inf, inf],

[-inf, -inf, -inf, inf],

[inf, inf, inf, -inf],

[inf, inf, inf, -inf]], grad_fn=)

If you look at example 2 below, you can see that nan is output only when +/-inf is directly entered as input.

example 2)

x = tensor([[inf, -inf, inf, -inf],

[-inf, inf, inf, -inf],

[-inf, -inf, -inf, inf],

[inf, -inf, inf, inf]])

y = Parameter containing:

tensor([[inf, -inf, -inf, inf],

[-inf, -inf, -inf, inf],

[-inf, -inf, -inf, inf],

[-inf, inf, -inf, inf]], requires_grad=True)

z = tensor([[nan, nan, nan, nan],

[nan, nan, nan, nan],

[nan, inf, nan, nan],

[nan, nan, nan, nan]], grad_fn=)

This result is different from the numerical operations we know.

I would like to hear the opinion of who know about that problem.