Gradient flow not seen for the segmentation network

I have a segmentation network which looks like this. I preTrain(forward pass has preTrain=True) it on the on the VOC dataset. But during pretraining only the last layer conv1_1_2 has gradient flowing and the weights getting updated. I have put the network class and gradient flow plot below.

class Network(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        # pre-trained features
        backbone = vgg16(is_caffe=True)
        
        l7 = [('fc7', nn.Conv2d(512, 1, 1))]
        l8 = [('fc8', nn.Conv2d(3, 3, 1))]
        self.encoder = copy.deepcopy(backbone)
        self.conv1_1_1=nn.Sequential(OrderedDict(l7))
        self.conv1_1_2=nn.Sequential(OrderedDict(l8))

         
    def forward(self, x, weights,preTrain=True):
        x = self.encoder(x)
        x=x/x.max()
        
        if preTrain==False:
              # weights come another network branch only in main training
              suppQueryFusion = [torch.add(x,weight.repeat(1, 1, x.size(2), x.size(3))) for weight in weights]
       else:
              suppQueryFusion = [x for weight in weights]

        weightedFeats = [self.conv1_1_1(feat) for feat in suppQueryFusion]
        concatedFeats = torch.cat( weightedFeats,dim=1)

        normClassFeats=F.normalize(concatedFeats,p=2,dim=1)
        Fusion=self.conv1_1_2(normClassFeats)
        x=self.conv1_1_2(Fusion)
        return x

The method for getting the plot can be seen [Check gradient flow in network - #7 by RoshanRane].

gradient

Could someone guide me why the gradient flow is zero and how I could overcome it?

I’m not sure I understand your forward method completely, but it looks like you are just using weights if pretrained is set to True. The output of your encoder won’t be used anymore, which might explain the missing gradients.

The current gradient scenario is from when preTrain==True and it does not show any gradient flow thru the network except the last layer.
The suppQueryFusion after the if-else construct looks like this [x,x,x],where x is the output from the encoder. So I would expect the encoder to be able to learn.

The penultimate layer also is not showing any gradient flow which is not part of the encoder. check out conv1_1_1.

Also in the case of where preTrain==False (not used untill now) the output of the encoder is added with weights and i would expect gradient to flow thru the encoder as well.

If preTrain=True, suppQueryFusion won’t be the output of the encoder, but will be provided via weights. If weights wasn’t computed by the encoder in the previsous run, the encoder output won’t be used in this scenario and thus you won’t see any gradients for it.

I’m wondering, why the penultimate layer doesn’t have gradients. Could you check them via print(model.conv1_1_1.weight.grad.sum()) after calling loss.backward()?