Torch.mm and elementwise multiplication, getting unequal outputs

dm23 · March 23, 2021, 5:08pm

Hi, I am a beginner in PyTorch, and recently faced an issue. Tried thinking a lot but couldn’t figure it out.

Consider we have two matrices
#take c = 500
x = torch.randn(c,c)
y = torch.randn(c,c)

And we do this :
r1 = torch.sum(x*y,axis=1)

Alternatively, if we instead take transpose of second matrix, and then perform a matrix multiplication and consider only diagonal elements, we should get the same answer.
So, we do this :
r2 = torch.diag(torch.mm(x,y.transpose(0,1)))

If we now compare for equality of the two :
r1==r2

Ideally we should get all True because they are same thing, but I get some False.
What exactly is going wrong? Can someone point out the mistake?

tom · March 23, 2021, 7:34pm

The mistake is to compare floating point variables with == instead of torch.allclose.
What you are seeing is that floating point numbers are not as nice as actual real numbers. For example, addition (of multiple values) does not commute due to numerical precision effects: Try print(1e20 + 1 - 1e20).

One of my recommendations to get a first impression of whether you hit numerical accuracy is to do the same computation with dtype=torch.double tensors. If the difference ((r1-r2).abs().max()) goes down dramatically (say from 1e-5ish to 1e-13ish in your example), this points to numerical accuracy issues. If it stays the same, this might point to a different issue (if only an approximation somewhere in the computation).

Best regards

Thomas