Torch.add() get different results

Maox · June 12, 2019, 6:15am

In Docs, there are torch.add(input, value, out=None) and torch.add(input, value, others, out=None).
I expect ‘b’ and ‘c’ in the following code should be the same in value but not:

import torch

lr = 1e-2
torch.manual_seed(666)
a = torch.randn(3,3)
a.grad = torch.randn(3,3)

b = torch.add(a, -lr, a.grad)
c = torch.add(a, -lr*a.grad)
print(torch.abs(b-c).sum())

So, can you please help me on why they are different?

jerinphilip · June 12, 2019, 7:33am

>>> import torch
>>>
>>> lr = 1e-2
>>> torch.manual_seed(666)
<torch._C.Generator object at 0x7f278fb0dc10>
>>> a = torch.randn(3,3)
>>> a.grad = torch.randn(3,3)
>>>
>>> b = torch.add(a, -lr, a.grad)
>>> c = torch.add(a, -lr*a.grad)
>>> print(torch.abs(b-c).sum())
tensor(1.8626e-09)

Aren’t they same? 1e-9 ~= 0.

Maox · June 12, 2019, 8:45am

Yes, the final value is 1.8626e-9. But it should be a strict 0.0 . There may be precision problem in float computing but the result should be the same.

tom · June 12, 2019, 8:57am

What happens is that those operations are implemented differently and that for floating point operations this different ordering matters.

Best regards

Thomas

Maox · June 12, 2019, 9:03am

Thanks for your reply.

So, you mean these two operations are implemented differently in C codes and that causes different results? Would you please give me a little more information about the differences?