[SOLVED] CNN+LSTM without any weight update using custom layer and loss library (Coral)

I’m trying to train a model composed by a CNN and a LSTM but during the training phase there are no weights update. I have read posts in this forum for days but I cannot figure out what is wrong with my code.

Here it is my model:

from coral_pytorch.layers import CoralLayer

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        dr_rate= 0.2
        pretrained = True
        rnn_hidden_size = 30
        rnn_num_layers = 2
        #get a pretrained vgg19 model ( taking only the cnn layers and fine tun them)
        baseModel = models.vgg19(pretrained=pretrained).features  
        i = 0
        for child in baseModel.children():
            if i < 28:
                for param in child.parameters():
                    param.requires_grad = False
            else:
                for param in child.parameters():
                    param.requires_grad = True
            i +=1

        num_features = 25088
        self.baseModel = baseModel
        self.dropout= nn.Dropout(dr_rate)
        self.rnn = nn.LSTM(num_features, rnn_hidden_size, rnn_num_layers , batch_first=True)
        self.fc1 = CoralLayer(size_in=30, num_classes=5)
        
    def forward(self, x):
        batch_size, time_steps, C, H, W = x.size()
        # reshape input  to be (batch_size * timesteps, input_size)
        x = x.contiguous().view(batch_size * time_steps, C, H, W)
        x = self.baseModel(x)
        x = x.view(x.size(0), -1)
        #make output as  ( samples, timesteps, output_size)
        x = x.contiguous().view(batch_size , time_steps , x.size(-1))
        x , (hn, cn) = self.rnn(x)
        ##### Use CORAL layer #####
        logits1 =  self.fc1(x[:, -1, :])
        probas1 = torch.sigmoid(logits1)
        ###--------------------------------------------------------------------###

        return  logits1, probas1

And that’s the training code:

model = Net().to(device)
optimizer = optim.Adam(model.parameters(),lr=0.001)

num_epochs = 10
for epoch in range(num_epochs):

    model = model.train()
    for batch_idx, (features, label1) in enumerate(train_dataloader):

        ##### Convert class labels for CORAL
        levels1 = levels_from_labelbatch(label1, num_classes=5).to(device)
        features = features.to(device)
        logits1, probas1 = model(features)

        #### CORAL loss 
        loss = coral_loss(logits1, levels1)
        
        
        #print(logits1.dtype, levels1.dtype, loss.dtype) # they are all tensors
        optimizer.zero_grad()
        
        a = list(model.parameters())[1].clone()
        loss.backward(retain_graph=True)
        optimizer.step()
        b = list(model.parameters())[1].clone()
        print(torch.equal(a,b))
        print(list(model.parameters())[0].grad)

        ### LOGGING
        if not batch_idx % batch_size:
            print ('Epoch: %02d/%02d | Batch %02d/%02d | Loss: %.4f' 
                   %(epoch+1, num_epochs, batch_idx, 
                     len(train_dataloader), loss))

A partial output:

Equals :  True
Grad :  None
Epoch: 10/10 | Batch 00/03 | Loss: 2.4990

The input tensor has a shape of (2, 10, 3, 224, 224) corresponding to batch_sizeXframesXchannelsXheightXwidth. I’m also using a custom loss and layer from this work: GitHub - Raschka-research-group/coral-pytorch: CORAL and CORN implementations for ordinal regression with deep neural networks.

I will appreciate any advice you can give me!

Additional info: I have used Coral with a MLP using the same training and it updates the weights so I think that it is a model problem

Couldn’t figure out a definite answer as of now, but could you please replace model = model.train() by just model.train().
Also then, put model.train outside the loops - maybe right before defining the optimiser instance.
Let me know how it goes after you do this.

Meanwhile, I’m going through your code thoroughly to figure out the error.

1 Like

Thank you for your help!
I have done what you suggested but unfortunately I got the same output.

In your current code you are checking the parameters at index1 and compare its updates:

        a = list(model.parameters())[1].clone()
        loss.backward(retain_graph=True)
        optimizer.step()
        b = list(model.parameters())[1].clone()
        print(torch.equal(a,b))

Later you are also checking if a valid gradient was calculated:

        print(list(model.parameters())[0].grad)

which doesn’t seem to be the case based on your output.

If you check the model initialization:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        dr_rate= 0.2
        pretrained = True
        rnn_hidden_size = 30
        rnn_num_layers = 2
        #get a pretrained vgg19 model ( taking only the cnn layers and fine tun them)
        baseModel = models.vgg19(pretrained=pretrained).features  
        i = 0
        for child in baseModel.children():
            if i < 28:
                for param in child.parameters():
                    param.requires_grad = False
            else:
                for param in child.parameters():
                    param.requires_grad = True
            i +=1

you would see that all parameters for i < 28 are frozen and are thus not expected to get any valid gradient nor any updates.
Based on this, I would assume the outputs are expected and you should either not freeze these parameters if you expect them to be trained or you should check another parameters which is trainable for its updates.

1 Like

Ohhh thank you!
I thought that in the parameters() there were only the trainable parameters but I was wrong.
So tell me if I have understood correctly: from parameters[0] to parameters[27] there’s no update because they are froze and the others are updated?

Lastly, I was trying to overfit only on a few balanced examples, but i can’t achieve that. If I’m correct I have to work on the model itself, since there are only a few trainable layers, or the hyperparameters?

No, not necessarily since you are iterating baseModel.children() and are then freezing all parameters of the first 28 child modules which might be more parameters.
Use named_children() and named_paramters() and print the name of the current module and parameter which will be frozen to better understand what exactly stays trainable.

1 Like

I will try that! Thank you so much that really helped me!