after noticing unexpected gradient values during a model training. I performed this experience and I expected that I should get the same gradient values however that was not the case. below you find a ready to run code. the first scenario was to run loss1.backward(retain_graph=True)

then loss2.backward()

the second experiment was the way around (run loss2.backward and then loss1.backward)

values were not the same.

```
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
dtype = torch.float32
X = torch.tensor([[1, 2, 3, 4, 5, 6]], dtype=dtype)
Y = torch.tensor([[1, 4, 9, 16, 25, 36]], dtype=dtype)
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
torch.manual_seed(3)
self.base_l1 = torch.nn.Linear(6, 6, bias=True)
self.base_l2 = torch.nn.Linear(6, 6, bias=True)
self.l3 = torch.nn.Linear(6, 6, bias=True)
self.l4 = torch.nn.Linear(6, 6, bias=True)
def forward(self, x):
x1 = self.base_l1(x)
x1 = F.relu(x1)
x1 = self.base_l2(x1)
x2 = x1
x2 = F.relu(x2)
x2 = self.l3(x2)
x2 = F.relu(x2)
x2 = self.l4(x2)
return x2, x1
model = Model()
Loss = nn.MSELoss()
y_pred2, y_pred1 = model(X)
print('grad0', model.base_l1.weight.grad)
loss1 = Loss(y_pred1, Y)
loss2 = Loss(y_pred2, Y)
# first scenario
#### comment this and uncomment second scenario and rerun
#'''
loss1.backward(retain_graph=True)
print('grad1', model.base_l1.weight.grad)
loss2.backward()
print('grad2', model.base_l1.weight.grad)
####
# second scenario uncomment after running 1st scenario
'''
loss2.backward(retain_graph=True)
print('grad2', model.base_l1.weight.grad)
loss1.backward()
print('grad1', model.base_l1.weight.grad)
'''
```

here we could clearly understand that retain_graph=True save all necessary information to recalculate the gradient again **but Also preserves also the grad values!!!** the new gradient will be added to the old one.

I do not think this is wished when we want to calculate a brand new gradient.