Little difference in the result of sparse calculation and dense calculation

I do a matrix multiplication of r1 = x*w1 when w1 is a dense tensor.

Then i create an sparse copy of w1 : w2 = w1.to_sparse() and calculate : r2=x*w2

now r1 and r2 slightly differ


x = torch.randn(700,500, dtype = torch.float32)


w2 = torch.rand(500,600 , dtype = torch.float32 )
w1 = w2.to_sparse()


print( torch.abs( w2 - w1).sum()   )
>>>tensor(0.)
r1 = torch.sparse.mm( w1.t()  , x.t()  ).t()

r2= torch.mm(x , w2)


print( torch.abs(  (r2 - r1)  ).sum()   )
>>>tensor(1.4281)

when i changed dtype to float64 this difference became low

x = torch.randn(700,500, dtype = torch.float64)


w2 = torch.rand(500,600 , dtype = torch.float64 )
w1 = w2.to_sparse()


print( torch.abs( w2 - w1).sum()   )
>>>tensor(0., dtype=torch.float64)

r1 = torch.sparse.mm( w1.t()  , x.t()  ).t()

r2= torch.mm(x , w2)


print( torch.abs(  (r2 - r1)  ).sum()   )
>>>tensor(2.6097e-09, dtype=torch.float64)

Why there is such a difference?
Is there any considerable problem?
How does it effect learning a network with sparse tensors?

This can be expected. On an item by item basis, you have a typical floating point difference for fp32:

(r1-r2).abs().max()
tensor(6.4850e-05)

As you have 4.2e5 items, you end up with a the difference you see. So everything is as expected here, but if you depend on the difference, you probably want fp64 for at least part of your computation.

Best regards

Thomas

I have got that this is due to round-off error.
but the thing is in the following code this error is 0.


x = torch.randn(700,500, dtype = torch.float32)


w2 = torch.rand(500,600 , dtype = torch.float32 )
w1 = w2.to_sparse()
w1 = w1.to_dense()
w1 = w1.clone()
print( torch.abs( w2 - w1).sum()   )
>>>tensor(0.)


r1= torch.mm(x , w1)
r2= torch.mm(x , w2)


print( torch.abs( 1* (r2 - r1)  ).sum()   )
>>>tensor(0.)

The thing that I want to know is that what mechanism is underlying torch.sparse.mm() compare to torch.mm() which cause this error?

As I know if we have a1=a2 (dtype = float32 ) and all 32 bits of a1 be identical to a2
and also have b1=b2 (dtype = float32 ) and all 32 bits of b1 be identical to b2 and we do an identical floating-point operation by an FPU and get c1=a1*b1 and c2=a2*b2 we expect that c1=c2 and all 32 bits of c1 be identical to c2.

Well, so summation results depend on the order of summation for floating point. The guarantees that you get the same order are very limited and non-existent between different implementations (or different number of available cores or …).

Best regards

Thomas