I have defined a custom layer which takes in features from the CONV layers and will output a loss and an output using the CONV inputs. The output from this layer then goes into the FC layers for classification. The forward method is described as follows:
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_bn(self.conv2(x)), 2))
x = x.view(-1, 500)
loss, x = self.custom_layer(x)
x = F.relu(self.fc1(x))
return loss, F.log_softmax(x, dim = 1)
Two different optimizers (both SGD) are used, one of them is for the parameters of the custom layer while the other is for the CONV and FC layers. While training however, the losses from the custom layer and the cross-entropy loss from the model is backward propagated.
optimizer1.zero_grad()
optimizer2.zero_grad()
loss1, outputs = model(inputs)
loss2 = criterion(outputs, labels)
total_loss = loss1 + loss2
total_loss.backward()
optimizer1.step()
optimizer2.step()
When I try to visualize the gradients, the gradients of the CONV layers are None while the gradient is being updated for the FC and custom layer, even though the requires_grad flag is true for all the layers. Is there anything else I need to define for the gradient to flow through the entire network?