3D CNN regularisation

I’m currently using a simple 3D network for binary classification, the model i’m using is:

import torch
import torch.nn as nn

class ConvModel(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(ConvModel, self).__init__()
        # conv3d (N, C, D, H, W)
        self.conv1 = nn.Conv3d(in_channels=in_channels, out_channels=64, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm3d(num_features=64)
        self.relu = nn.ReLU()
        # square window stride=2
        self.mpool = nn.MaxPool3d(kernel_size=3, stride=2, padding=1)
        self.conv2 = nn.Conv3d(in_channels=64, out_channels=64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm3d(num_features=64)
        self.conv3 = nn.Conv3d(in_channels=64, out_channels=64, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm3d(num_features=64)
        self.conv4 = nn.Conv3d(in_channels=64, out_channels=128, kernel_size=3, padding=1)
        self.bn4 = nn.BatchNorm3d(num_features=128)
        self.conv5 = nn.Conv3d(in_channels=128, out_channels=256, kernel_size=3, padding=1)
        self.bn5 = nn.BatchNorm3d(256)
        
        self.fc = nn.Linear(8*8*2*256, 512)
        self.dropout = nn.Dropout(0.4)
        self.dropout2 = nn.Dropout(0.3)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, num_classes)
        self.flatten = nn.Flatten()
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.mpool(x)
        
        
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.mpool(x)
        
        x = self.conv3(x)
        x = self.bn3(x)
        x = self.relu(x)
        x = self.mpool(x)

        x = self.conv4(x)
        x = self.bn4(x)
        x = self.relu(x)
        x = self.mpool(x)
        
        x = self.conv5(x)
        x = self.bn5(x)
        x = self.relu(x)
        x = self.mpool(x)
       
        x = self.flatten(x)
        
        x = self.fc(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.dropout2(x)
        x = self.fc3(x)
        return x

When training this with Adam (lr=1e-6), even after 50 epochs the model dosen’t generalise well and training loss with BCE is around 0.5-0.6. However if i remove the Dropout layers model overfits training data in 15 epochs with loss 0.01.

I’m not sure what kind of changes should i introduce to get my model to generalise, is using nn.Dropout3D something i should consider. I’ve been trying with different dropout probabilities but the best validations AUC ROC i get is 0.6 which can be just coz of randomness.

Anty help will be appreciated

Cheers

A couple of things you could try:

  • Apply ReLU before batch normalization, not after as you do now.
  • Have a separate nn.ReLU() instance for each application of ReLU. Your current code shares the same instance across invocations. Ideally this should not matter, but perhaps there is some sharing of state across these calls? Since you are stuck, you might as well try this and see if it helps.

The first point above may actually make a significant difference. The second point is just something worth trying because it takes only a couple of extra lines of code; it may really not help much.

Edited to add: @arya47 : Do give an update here if either of these helped or not.

1 Like

mate, this really helped in making progress.
The best result i’ve gotten by making these changes is a validation loss of 0.62 (BCE) and auc 0.69 which is much higher than what i was getting earlier on validation.

Happy to have been of help!