All parameter gradients are zero except first forward-backward loop

Hello!
Here is my net:

``````    class SimplestNet(nn.Module):
# Lets make a simple Net, 1
def __init__(self):
super(SimplestNet, self).__init__()
self.conv1 = nn.Conv2d(1, 8, 3, padding=3)  # in_channels, out_channels, kernel_size
self.conv2 = nn.Conv2d(8, 8, 1)
self.conv3 = nn.Conv2d(8, 1, 5)

def forward(self, x):
print("start forwarding")
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = self.conv3(x)
x = F.relu(x)
return x
``````

And here I’m trying to run 3 forward-backward loops:

``````    learning_rate = 0.01

print ("p is : ",p)
print (p.shape)
print ("r is : ",r)
print (r.shape)
for i in [0,1,2]:
print ("-------------",i)
out = net.forward(p)
loss = criterion(out, r)
print ("loss=",loss.item())
loss.backward(retain_graph=True)
print ("Before canging weights")
print ("Data: ",net.conv1.bias.data)
for y in net.parameters():
print ("After changing weights")
print ("Data: ",net.conv1.bias.data)
sys.exit(0)
``````

I’m getting following outptut (see below)
As far as I can see, grads of model params are not updated. Does anybody know why?

``````Output
p is :  tensor([[[[0.0000e+00, 3.6788e+06, 1.3534e+06, 4.9787e+05, 1.8316e+05],
[0.0000e+00, 0.0000e+00, 3.6788e+06, 1.3534e+06, 4.9787e+05],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 3.6788e+06, 1.3534e+06],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 3.6788e+06],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]]]],
torch.Size([1, 1, 5, 5])
r is :  tensor([[[[0.0000, 0.3303, 0.1162, 0.0371, 0.0203],
[0.0000, 0.0000, 0.3741, 0.1136, 0.0239],
[0.0000, 0.0000, 0.0000, 0.0812, 0.0208],
[0.0000, 0.0000, 0.0000, 0.0000, 0.3005],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]]],
torch.Size([1, 1, 5, 5])
------------- 0
start forwarding
loss requers grad:  True
loss= 157155147776.0
Before canging weights
Grad:  tensor([ 102787.3516, -237806.8750,   19547.1562,  138869.3438, -156843.2500,
-1003.6752,  -68224.1328,  233138.7344])
Data:  tensor([ 0.0917, -0.2098,  0.0329,  0.1649,  0.1793,  0.2022, -0.2823, -0.0442])
All gradients are zero:  False
After changing weights
Grad:  tensor([ 102787.3516, -237806.8750,   19547.1562,  138869.3438, -156843.2500,
-1003.6752,  -68224.1328,  233138.7344])
Data:  tensor([-1027.7817,  2377.8589,  -195.4386, -1388.5284,  1568.6118,    10.2390,
681.9590, -2331.4314])
------------- 1
start forwarding
loss requers grad:  True
loss= 0.3751196265220642
Before canging weights
Grad:  tensor([0., 0., 0., 0., 0., 0., 0., 0.])
Data:  tensor([-1027.7817,  2377.8589,  -195.4386, -1388.5284,  1568.6118,    10.2390,
681.9590, -2331.4314])
All gradients are zero:  True
After changing weights
Grad:  tensor([0., 0., 0., 0., 0., 0., 0., 0.])
Data:  tensor([-1027.7817,  2377.8589,  -195.4386, -1388.5284,  1568.6118,    10.2390,
681.9590, -2331.4314])
------------- 2
start forwarding
loss requers grad:  True
loss= 0.3751196265220642
Before canging weights
Grad:  tensor([0., 0., 0., 0., 0., 0., 0., 0.])
Data:  tensor([-1027.7817,  2377.8589,  -195.4386, -1388.5284,  1568.6118,    10.2390,
681.9590, -2331.4314])
All gradients are zero:  True
After changing weights
Grad:  tensor([0., 0., 0., 0., 0., 0., 0., 0.])
Data:  tensor([-1027.7817,  2377.8589,  -195.4386, -1388.5284,  1568.6118,    10.2390,
681.9590, -2331.4314])
• Why do you use `retain_graph=True` this should not be needed here.
• Do not use `.data` for your paremeter update, wrap it in `with torch.no_grad():` context manager and just do `y -= y.grad * learning_rate`.