I take the loss values and add them together before running: backward(retain_graph=True)
.
For example:
for value in value_list:
loss += value.lossA
for value in value_list:
loss += value.lossB
loss.backward(retain_graph=True)
The loss variable after running backward()
, will then be a float value. For the example, let’s say that value is: 8088996.0
.
Now because I used retain_graph=True, I can collect the individual loss values and add them together like I did above, only this time they have already had .backward()
used on them (because of the single .backward()
above):
for value in value_list:
totalLoss += value.lossA
for value in value_list:
totalLoss += value.lossB
The result of adding these loss variables together, would then give me a result like: 8088995.81122
.
Now I would expect for loss
to be equal to totalLoss
, but they are slightly different. I graphed the difference between these two total loss values over time, here: https://i.imgur.com/kiFrWVY.png
Here are some more examples:
totalloss: 3512584.05008
loss: 3512584.25
totalloss: 1081956.37198
loss: 1081956.375
totalloss: 1028641.08926
loss: 1028641.0625
Is this supposed to be expected behavior for PyTorch? Or is there some sort of rounding going on somewhere?
This occurred with both the Adam and L-BFGS optimizers. I have tested my code on PyTorch v0.3.1, and v0.4, but the issue remains.
I’m not sure if this might be the reason, since the difference is quite large, but using float numbers the order of the operations can yield small perturbations in the results:
a = torch.randn(10, 10, 10)
a_sum = a.sum()
a_sum_sep = a.sum(0).sum(0).sum(0)[0]
err = a_sum - a_sum_sep
print(err)
I think that I’ve figured it out.
Using this:
totalLoss += test_module.loss
Gives me the “rounded” result.
While using:
totalLoss += test_module.loss.item()
Or:
totalLoss += float(test_module.loss)
Gives me the correct result.
But I can only use .item()
or float()
for printing the values. I don’t know how to add my loss variables together in a way that doesn’t result in this “rounding”.
This code reproduces the issue, I think:
import torch
import torch.nn as nn
from torch.autograd import Variable
crit = nn.MSELoss()
####################
torch.manual_seed(876)
a = Variable(torch.randn(3,64,64), requires_grad=True)
torch.manual_seed(875)
b = Variable(torch.randn(3,64,64))
c = crit(a,b)
#####################
torch.manual_seed(874)
d = Variable(torch.randn(3,64,64), requires_grad=True)
torch.manual_seed(873)
e = Variable(torch.randn(3,64,64))
f = crit(d,e)
####################
torch.manual_seed(876)
h = Variable(torch.randn(3,64,64), requires_grad=True)
torch.manual_seed(875)
i = Variable(torch.randn(3,64,64))
j = crit(h,i)
#####################
test = (j, f, c)
g = 0
for loss in test:
g+=loss
g.backward(retain_graph=True)
print("Rounded Results:")
print("Using +=: " + str(float(g)))
print("Using a single float(): " + str( float(j + f + c) ))
print("Using a single item(): " + str( (j + f + c).item() ))
print("Working Methods:")
print("Adding j, f, c, separately using float(): " + str(float(j) + float(f) + float(c) ))
print("Adding j, f, c, separately using item(): " + str(j.item() + f.item() + c.item() ))
print("The difference Between Adding Methods:")
print("Difference: " + str( (float(j) + float(f) + float(c)) - float(g) ))
Should I report this as a bug? This “rounding” seem to only occur with loss Variable
s (and maybe other Variable
s?), and I can’t recreate it with just regular float numbers. So I assume it’s PyTorch and not Python which has this issue?
The above example script results in this output to the terminal:
Rounded Results:
Using +=: 6.0663561821
Using a single float(): 6.0663561821
Using a single item(): 6.0663561821
Working Methods:
Adding j, f, c, separately using float(): 6.06635594368
Adding j, f, c, separately using item(): 6.06635594368
The difference Between Adding Methods:
Difference: -2.38418579102e-07
The loss variables j
, f
, and c
are equal to:
j
= 2.03060102463
f
= 2.00515389442
c
= 2.03060102463
When I add them together in python manually, I don’t get the rounded result:
>>> a = 2.03060102463 + 2.00515389442 + 2.03060102463
>>> print(a)
6.06635594368
>>>