All parameter gradients are zero except first forward-backward loop

Hello!
Here is my net:

    class SimplestNet(nn.Module):
        # Lets make a simple Net, 1
        def __init__(self):
            super(SimplestNet, self).__init__()
            self.conv1 = nn.Conv2d(1, 8, 3, padding=3)  # in_channels, out_channels, kernel_size
            self.conv2 = nn.Conv2d(8, 8, 1)
            self.conv3 = nn.Conv2d(8, 1, 5)

        def forward(self, x):
            print("start forwarding")
            x = self.conv1(x)
            x = F.relu(x)
            x = self.conv2(x)
            x = F.relu(x)
            x = self.conv3(x)
            x = F.relu(x)
            return x

And here I’m trying to run 3 forward-backward loops:

    learning_rate = 0.01
    net.zero_grad()

    print ("p is : ",p)
    print (p.shape)
    print ("r is : ",r)
    print (r.shape)
    for i in [0,1,2]:
        print ("-------------",i)
        out = net.forward(p)
        loss = criterion(out, r)
        print ("loss requers grad: ",loss.requires_grad)
        print ("loss=",loss.item())
        loss.backward(retain_graph=True)
        print ("Before canging weights")
        print ("Grad: ",net.conv1.bias.grad)
        print ("Data: ",net.conv1.bias.data)
        all_grads = []
        for y in net.parameters():
            y.data.sub_(y.grad.data * learning_rate)
            y.requires_grad = True
            all_grads.append(y.grad.numpy().flatten())
        print ("All gradients are zero: ",np.all(np.concatenate(tuple(all_grads))==0))
        print ("After changing weights")
        print ("Grad: ",net.conv1.bias.grad)
        print ("Data: ",net.conv1.bias.data)
        net.zero_grad()
        print ("net.conv1.bias.requires_grad=",net.conv1.bias.requires_grad)
    sys.exit(0)

I’m getting following outptut (see below)
As far as I can see, grads of model params are not updated. Does anybody know why?

Output
p is :  tensor([[[[0.0000e+00, 3.6788e+06, 1.3534e+06, 4.9787e+05, 1.8316e+05],
          [0.0000e+00, 0.0000e+00, 3.6788e+06, 1.3534e+06, 4.9787e+05],
          [0.0000e+00, 0.0000e+00, 0.0000e+00, 3.6788e+06, 1.3534e+06],
          [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 3.6788e+06],
          [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]]]],
       grad_fn=<UnsqueezeBackward0>)
torch.Size([1, 1, 5, 5])
r is :  tensor([[[[0.0000, 0.3303, 0.1162, 0.0371, 0.0203],
          [0.0000, 0.0000, 0.3741, 0.1136, 0.0239],
          [0.0000, 0.0000, 0.0000, 0.0812, 0.0208],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.3005],
          [0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]]],
       grad_fn=<UnsqueezeBackward0>)
torch.Size([1, 1, 5, 5])
------------- 0
start forwarding
loss requers grad:  True
loss= 157155147776.0
Before canging weights
Grad:  tensor([ 102787.3516, -237806.8750,   19547.1562,  138869.3438, -156843.2500,
          -1003.6752,  -68224.1328,  233138.7344])
Data:  tensor([ 0.0917, -0.2098,  0.0329,  0.1649,  0.1793,  0.2022, -0.2823, -0.0442])
All gradients are zero:  False
After changing weights
Grad:  tensor([ 102787.3516, -237806.8750,   19547.1562,  138869.3438, -156843.2500,
          -1003.6752,  -68224.1328,  233138.7344])
Data:  tensor([-1027.7817,  2377.8589,  -195.4386, -1388.5284,  1568.6118,    10.2390,
          681.9590, -2331.4314])
net.conv1.bias.requires_grad= True
------------- 1
start forwarding
loss requers grad:  True
loss= 0.3751196265220642
Before canging weights
Grad:  tensor([0., 0., 0., 0., 0., 0., 0., 0.])
Data:  tensor([-1027.7817,  2377.8589,  -195.4386, -1388.5284,  1568.6118,    10.2390,
          681.9590, -2331.4314])
All gradients are zero:  True
After changing weights
Grad:  tensor([0., 0., 0., 0., 0., 0., 0., 0.])
Data:  tensor([-1027.7817,  2377.8589,  -195.4386, -1388.5284,  1568.6118,    10.2390,
          681.9590, -2331.4314])
net.conv1.bias.requires_grad= True
------------- 2
start forwarding
loss requers grad:  True
loss= 0.3751196265220642
Before canging weights
Grad:  tensor([0., 0., 0., 0., 0., 0., 0., 0.])
Data:  tensor([-1027.7817,  2377.8589,  -195.4386, -1388.5284,  1568.6118,    10.2390,
          681.9590, -2331.4314])
All gradients are zero:  True
After changing weights
Grad:  tensor([0., 0., 0., 0., 0., 0., 0., 0.])
Data:  tensor([-1027.7817,  2377.8589,  -195.4386, -1388.5284,  1568.6118,    10.2390,
          681.9590, -2331.4314])
net.conv1.bias.requires_grad= True

Hi,

  • Why do you use retain_graph=True this should not be needed here.
  • Do not use .data for your paremeter update, wrap it in with torch.no_grad(): context manager and just do y -= y.grad * learning_rate.