Adding up intermediate losses results in a slightly different value than the total loss?

I take the loss values and add them together before running: backward(retain_graph=True).

For example:

for value in value_list:
     loss +=  value.lossA
for value in value_list:
     loss += value.lossB

loss.backward(retain_graph=True)

The loss variable after running backward(), will then be a float value. For the example, let’s say that value is: 8088996.0.

Now because I used retain_graph=True, I can collect the individual loss values and add them together like I did above, only this time they have already had .backward() used on them (because of the single .backward() above):

for value in value_list:
     totalLoss +=  value.lossA
for value in value_list:
     totalLoss += value.lossB

The result of adding these loss variables together, would then give me a result like: 8088995.81122.

Now I would expect for loss to be equal to totalLoss, but they are slightly different. I graphed the difference between these two total loss values over time, here: https://i.imgur.com/kiFrWVY.png

Here are some more examples:

  totalloss: 3512584.05008
  loss: 3512584.25

  totalloss: 1081956.37198
  loss: 1081956.375

  totalloss: 1028641.08926
  loss: 1028641.0625

Is this supposed to be expected behavior for PyTorch? Or is there some sort of rounding going on somewhere?

This occurred with both the Adam and L-BFGS optimizers. I have tested my code on PyTorch v0.3.1, and v0.4, but the issue remains.

I’m not sure if this might be the reason, since the difference is quite large, but using float numbers the order of the operations can yield small perturbations in the results:

a = torch.randn(10, 10, 10)
a_sum = a.sum()
a_sum_sep = a.sum(0).sum(0).sum(0)[0]
err = a_sum - a_sum_sep
print(err)

I think that I’ve figured it out.

Using this:

totalLoss += test_module.loss

Gives me the “rounded” result.

While using:

totalLoss += test_module.loss.item()

Or:

totalLoss += float(test_module.loss)

Gives me the correct result.

But I can only use .item() or float() for printing the values. I don’t know how to add my loss variables together in a way that doesn’t result in this “rounding”.

This code reproduces the issue, I think:

import torch
import torch.nn as nn
from torch.autograd import Variable
crit = nn.MSELoss()

####################
torch.manual_seed(876)
a = Variable(torch.randn(3,64,64), requires_grad=True)

torch.manual_seed(875)
b = Variable(torch.randn(3,64,64))
c = crit(a,b)
#####################
torch.manual_seed(874)
d = Variable(torch.randn(3,64,64), requires_grad=True)

torch.manual_seed(873)
e = Variable(torch.randn(3,64,64))
f = crit(d,e)

####################
torch.manual_seed(876)
h = Variable(torch.randn(3,64,64), requires_grad=True)

torch.manual_seed(875)
i = Variable(torch.randn(3,64,64))
j = crit(h,i)
#####################

test = (j, f, c)

g = 0
for loss in test: 
   g+=loss 
g.backward(retain_graph=True)


print("Rounded Results:")
print("Using +=: " + str(float(g)))
print("Using a single float(): " + str( float(j + f + c)  ))
print("Using a single item(): " + str( (j + f + c).item()  ))

print("Working Methods:")
print("Adding j, f, c, separately using float(): " + str(float(j) + float(f) + float(c) ))
print("Adding j, f, c, separately using item(): " + str(j.item() + f.item() + c.item() ))

print("The difference Between Adding Methods:")
print("Difference: " + str( (float(j) + float(f) + float(c)) - float(g) ))

Should I report this as a bug? This “rounding” seem to only occur with loss Variables (and maybe other Variables?), and I can’t recreate it with just regular float numbers. So I assume it’s PyTorch and not Python which has this issue?

The above example script results in this output to the terminal:

Rounded Results:
Using +=: 6.0663561821
Using a single float(): 6.0663561821
Using a single item(): 6.0663561821
Working Methods:
Adding j, f, c, separately using float(): 6.06635594368
Adding j, f, c, separately using item(): 6.06635594368
The difference Between Adding Methods:
Difference: -2.38418579102e-07

The loss variables j, f, and c are equal to:

j = 2.03060102463

f = 2.00515389442

c = 2.03060102463

When I add them together in python manually, I don’t get the rounded result:

>>> a = 2.03060102463 + 2.00515389442 + 2.03060102463
>>> print(a)
6.06635594368
>>>