after noticing unexpected gradient values during a model training. I performed this experience and I expected that I should get the same gradient values however that was not the case. below you find a ready to run code. the first scenario was to run loss1.backward(retain_graph=True)
then loss2.backward()
the second experiment was the way around (run loss2.backward and then loss1.backward)
values were not the same.
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
dtype = torch.float32
X = torch.tensor([[1, 2, 3, 4, 5, 6]], dtype=dtype)
Y = torch.tensor([[1, 4, 9, 16, 25, 36]], dtype=dtype)
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
torch.manual_seed(3)
self.base_l1 = torch.nn.Linear(6, 6, bias=True)
self.base_l2 = torch.nn.Linear(6, 6, bias=True)
self.l3 = torch.nn.Linear(6, 6, bias=True)
self.l4 = torch.nn.Linear(6, 6, bias=True)
def forward(self, x):
x1 = self.base_l1(x)
x1 = F.relu(x1)
x1 = self.base_l2(x1)
x2 = x1
x2 = F.relu(x2)
x2 = self.l3(x2)
x2 = F.relu(x2)
x2 = self.l4(x2)
return x2, x1
model = Model()
Loss = nn.MSELoss()
y_pred2, y_pred1 = model(X)
print('grad0', model.base_l1.weight.grad)
loss1 = Loss(y_pred1, Y)
loss2 = Loss(y_pred2, Y)
# first scenario
#### comment this and uncomment second scenario and rerun
#'''
loss1.backward(retain_graph=True)
print('grad1', model.base_l1.weight.grad)
loss2.backward()
print('grad2', model.base_l1.weight.grad)
####
# second scenario uncomment after running 1st scenario
'''
loss2.backward(retain_graph=True)
print('grad2', model.base_l1.weight.grad)
loss1.backward()
print('grad1', model.base_l1.weight.grad)
'''
here we could clearly understand that retain_graph=True save all necessary information to recalculate the gradient again but Also preserves also the grad values!!! the new gradient will be added to the old one.
I do not think this is wished when we want to calculate a brand new gradient.