The gradients of the first few conv layers are None in a custom model

astrogrl · February 26, 2020, 7:43pm

I have defined a custom layer which takes in features from the CONV layers and will output a loss and an output using the CONV inputs. The output from this layer then goes into the FC layers for classification. The forward method is described as follows:

def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_bn(self.conv2(x)), 2))
        x = x.view(-1, 500)
        loss, x = self.custom_layer(x)
        x = F.relu(self.fc1(x))
        return loss, F.log_softmax(x, dim = 1)

Two different optimizers (both SGD) are used, one of them is for the parameters of the custom layer while the other is for the CONV and FC layers. While training however, the losses from the custom layer and the cross-entropy loss from the model is backward propagated.

optimizer1.zero_grad()
optimizer2.zero_grad()
loss1, outputs = model(inputs)
loss2 = criterion(outputs, labels)
total_loss = loss1 + loss2
total_loss.backward()
optimizer1.step()
optimizer2.step()

When I try to visualize the gradients, the gradients of the CONV layers are None while the gradient is being updated for the FC and custom layer, even though the requires_grad flag is true for all the layers. Is there anything else I need to define for the gradient to flow through the entire network?

albanD · February 26, 2020, 8:00pm

You need to make sure that your custom layer does not break the flow of the gradients for its inputs.
In particular, you should not .detach() or unwrap/rewrap the tensor that is given as input (or use .data but that is a general rule everywhere).

Can you share the forward code for your custom_layer?

astrogrl · February 26, 2020, 8:23pm

Yes, you are right, I was detaching a variable at some point in the forward code of the custom_layer. Thank you!