STILL overfitting image classification for CheXpert dataset

Hi there,
I try to classify images from CheXpert dataset on only one of the observations (Alectasis) and a 2 class classification problem (1 true, 0 false).
I preprocess the images by resizing them to 224x224 and normalizing them. I used 30,000 pictures for training (10% validation) and 7,500 test images.
As a model I am using a pretrained ResNet34 on ImageNet.
When running the model it is overfitting: training loss decreases to 0.043 wheareas the validation loss rises to 2.199 which leads to a test accuracy of 55.56%.
I tried following attempts to prevent the overfitting:

#Attempt 1: I used a classifier with dropout layers
> Validation loss decreased in the beginning but after 6 epochs it was rising again

#Attempt 2: I added dropout layers in the whole model
> Validation loss decreased in the beginning but model converged very slowly. After some epochs, validation loss increased and model also converged faster

#Attempt 3: I froze all nonlinear layers and fitted the model only on the last linear layer
> Network did not seem to converge at all, not even after 50 epochs

Making the classification task a binary classification task also changed nothing.

class ResNet(FitModule):
    def __init__(self, num_classes=2):
        super(ResNet, self).__init__()
        self.net = torchvision.models.resnet34(pretrained=True)
        # Change classifier
        kernel_count = self.net.fc.in_features
        self.net.fc = nn.Sequential(nn.Linear(kernel_count, 500), nn.Linear(500, num_classes))
        self.dropout = nn.Dropout(p=0.5)
        # Attempt 1: use classifier with dropout layers
        '''
        self.net.fc = nn.Sequential(
            nn.BatchNorm1d(kernel_count),
            nn.Dropout(p=0.5),
            nn.Linear(in_features=kernel_count, out_features=500),
            nn.ReLU(),
            nn.BatchNorm1d(500),
            nn.Dropout(p=0.5),
            nn.Linear(in_features=500, out_features=num_classes))
        '''


    def freeze_nonlinear_layers(self):
        self._freeze_layer(self.net.conv1)
        self._freeze_layer(self.net.bn1)
        self._freeze_layer(self.net.relu)
        self._freeze_layer(self.net.maxpool)
        self._freeze_layer(self.net.layer1)
        self._freeze_layer(self.net.layer2)
        self._freeze_layer(self.net.layer3)
        self._freeze_layer(self.net.layer4)
        self._freeze_layer(self.net.avgpool)


    def _freeze_layer(self, layer, freeze=True):
        if freeze:                              
            for p in layer.parameters():
                p.requires_grad = False
        else:                                   
            for p in layer.parameters():
                p.requires_grad = True

    def forward(self, inputs):
        # Attempt 2: build whole network with dropout layers
        '''
        out = self.net.conv1(inputs)
        out = self.net.bn1(out)
        out = self.net.relu(out)
        out = self.net.maxpool(out)
        out = self.dropout(out)
        out = self.net.layer1(out)
        out = self.dropout(out)
        out = self.net.layer2(out)
        out = self.dropout(out)
        out = self.net.layer3(out)
        out = self.dropout(out)
        out = self.net.layer4(out)
        out = self.dropout(out)
        out = self.net.avgpool(out)
        out = out.view(out.size(0), -1)
        out = self.net.fc(out)
        '''
        # Attempt 3: freeze nonlinear layers and only train last linear layer
        '''
        self.freeze_nonlinear_layers
        '''
        return out/self.net(inputs) # attempt 2/attempt 1,3

To sum things up: Either the network does not converge or the validation loss rises and the test accuracy is poor.

Help is much appreciated. Thanks in advance!

You approaches sounds reasonable and it’s hard to tell what could work.

I would generally recommend to check the complete data loading and preprocessing pipeline.
Sometimes bugs might have been added, such as mixed up labels etc., which are really nasty.

Once this is done, you could try to increase the augmentation for the training set and hope that it would better capture the underlying data distribution.

If that all doesn’t help, run a quick test by swapping the validation Dataset for the training set.
This should give you a good validation loss. If that’s not the case, recheck the validation pipeline, as there might be an issue.

1 Like

Thanks for the response. I am loading the images and labels as numpy ndarrays.
This process seems to be fine. In the pytorch fitmodule I transform the ndarrays back to tensors and create dataloaders:

train_data = DataLoader(TensorDataset(X_train, y_train), batch_size, shuffle)

Afterwards I do run my epochs and go through the data in batches.

I am not quite sure about the way I am currently creating the batch data.

        for t in range(initial_epoch, epochs):
            
            # TRAINING
            self.train()
            train_epoch_loss = 0.0
            train_epoch_acc = 0
            # Run batches
            for batch_i, batch_data in enumerate(train_data):       
                # Get batch data
                X_train_batch = Variable(batch_data[0], requires_grad=True).float()     
                y_train_batch = Variable(batch_data[1], requires_grad=True).long()       
                # GPU access
                X_train_batch, y_train_batch = X_train_batch.to(device), y_train_batch.to(device)
                # Backprop
                opt.zero_grad()
                y_train_pred = self(X_train_batch)
                train_loss = loss(y_train_pred, y_train_batch)       
                train_loss.backward()
                opt.step()
                # Update status
                train_epoch_loss += train_loss.item()
                for param in self.parameters():
                    param.requires_grad = True
                # acc
                _, train_preds = torch.max(y_train_pred.data, dim=1)
                train_epoch_acc += torch.sum(train_preds == y_train_batch.data)

            # VALIDATION                                                                                
            val_epoch_loss = 0.0
            val_epoch_acc = 0
            if X_val is not None and y_val is not None:
                val_data = DataLoader(TensorDataset(X_val, y_val), batch_size, shuffle)
                self.eval()
                with torch.no_grad():

                    for batch_i, batch_data in enumerate(val_data):
                        # Get batch data
                        X_val_batch = Variable(batch_data[0], requires_grad=False).float()  
                        y_val_batch = Variable(batch_data[1], requires_grad=False).long()  
                        # GPU access
                        X_val_batch, y_val_batch = X_val_batch.to(device), y_val_batch.to(device)
                        y_val_pred = self(X_val_batch)
                        # loss
                        val_loss = loss(y_val_pred, y_val_batch)        
                        val_epoch_loss += val_loss.item()
                        # acc
                        _, val_preds = torch.max(y_val_pred.data, dim=1)
                        val_epoch_acc += torch.sum(val_preds == y_val_batch.data)
                        

Does this seem alright to you?

The code looks generally alright. Some minor issues:

  • Variables are deprecated since PyTorch 0.4, so you can use tensors now
  • most likely you don’t need gradients in the input data and target, so you could skip the requires_grad=True attribute during training
  • you don’t need to reset the requires_grad attribute for all parameters in each iteration
  • don’t use the .data attribute, as it might have unwanted side effects.

Did you run the check by swapping the training set for the validation set?

Thanks for your advice, I changed it, but didn’t help with the overfitting problem of my network. What do you mean by swapping the training set for the validation set?

I also tried transforms.RandomRotation(degrees=5) and transforms.RandomHorizontalFlip(p=0.5) on my train images. But after 4 epochs, the val loss started rising again and the resulting test accuracy was poor.

From the previous post:

If that all doesn’t help, run a quick test by swapping the validation Dataset for the training set.
This should give you a good validation loss. If that’s not the case, recheck the validation pipeline, as there might be an issue.

I swapped the validation dataset for the training set but the problem was still the same. So you think there is a mistake within the validation pipeline and I am generating the validation data wrong?
I am loading my data as follows:


# Preprocessing of the pictures
IMAGE_SIZE = 224                              # Image size (224x224)
IMAGENET_MEAN = [0.485, 0.456, 0.406]         # Mean of ImageNet dataset (used for normalization)
IMAGENET_STD = [0.229, 0.224, 0.225]          # Std of ImageNet dataset (used for normalization)

def load_and_format_image(path, type, normalization=True):
    image_transformation = [
        transforms.Resize((IMAGE_SIZE, IMAGE_SIZE))
    ]
    if type == 'train':
        #image_transformation.append(transforms.RandomHorizontalFlip(p=0.5))
        image_transformation.append(transforms.ToTensor())
    else:
        image_transformation.append(transforms.ToTensor())
    if normalization:
        # Normalization with mean and std from ImageNet
        image_transformation.append(transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD))

    image_transformation = transforms.Compose(image_transformation)
    img = Image.open(path).convert("RGB")                                               # 32,32,3
    img = image_transformation(img)                                                     # 3,32,32
    img_array = np.array(img)
    return img_array

def load_label(value):
    if value == 1:
        return 1
    elif value == -1:
        return 0

def image_train_gen(path_to_csv, length, type):
    i = 0
    df_train = pd.read_csv(path_to_csv)
    df_train.sample(frac=1)  # shuffle the data frame
    while True:
        X = []
        y = []
        for b in range(length):
            data_point = df_train.iloc[i]
            i += 1

            X.append(load_and_format_image(config["path_to_image_data"] + str(data_point["Path"]), type))
            y.append(load_label(data_point["Atelectasis"]))

        return np.array(X), np.array(y)

When checking labels and pictures rondomly, there does not seem to be a mistake.
I create the training data with 40000 images, validation data with 4000 images and test data with 7500 images.

If you’ve used the training dataset instead of the validation dataset in your validation method and still get bad results, I would assume that the validation method is wrong, no?

Just to make sure we are talking about the same use cases:
Initial use case:

train(train_dataset) # good training loss
validate(val_dataset) # bad validation loss

Your current test replacing the val_dataset

train(train_dataset) # good training loss
validate(train_dataset) # bad "fake" validation loss

If that’s the case, the error would point towards validate, not the val_dataset.

Yes, we are talking about the same use case. It does not matter whether I take the training or the validation dataset, the validation loss is always bad whereas the training loss is good.

        for t in range(initial_epoch, epochs):
            
            # TRAINING
            self.train()
            train_epoch_loss = 0.0
            train_epoch_acc = 0
            # Run batches
            for batch_i, batch_data in enumerate(train_data):  
                # Get batch data
                X_train_batch = batch_data[0].float()       
                y_train_batch = batch_data[1].long()        
                # GPU access
                X_train_batch, y_train_batch = X_train_batch.to(device), y_train_batch.to(device)
                # Backprop
                opt.zero_grad()
                y_train_pred = self(X_train_batch)
                train_loss = loss(y_train_pred, y_train_batch)     
                train_loss.backward()
                opt.step()
                # Update status
                train_epoch_loss += train_loss.item()
                for param in self.parameters():
                    param.requires_grad = True
                # acc
                _, train_preds = torch.max(y_train_pred, dim=1)
                train_epoch_acc += torch.sum(train_preds == y_train_batch)

            # VALIDATION
            val_epoch_loss = 0.0
            val_epoch_acc = 0
            if X_val is not None and y_val is not None:
                val_data = get_loader(X_val, y_val, batch_size, shuffle)
                self.eval()
                with torch.no_grad():

                    for batch_i, batch_data in enumerate(val_data):
                        # Get batch data
                        X_val_batch = batch_data[0].float()  
                        y_val_batch = batch_data[1].long()  
                        # GPU access
                        X_val_batch, y_val_batch = X_val_batch.to(device), y_val_batch.to(device)
                        y_val_pred = self(X_val_batch)
                        # loss
                        val_loss = loss(y_val_pred, y_val_batch)
                        val_epoch_loss += val_loss.item()
                        # acc
                        _, val_preds = torch.max(y_val_pred, dim=1)
                        val_epoch_acc += torch.sum(val_preds == y_val_batch)

This is the part of my fit method with training and validation.

Thanks for the update.
Based on these observations my best guess is that the batchnorm statistics might not reflect the underlying dataset stats, which would explain why the training data is also creating a high loss in model.eval().
How large is your batch size and could you drop the potentially last batch, if it’s smaller, via drop_last=True in the DataLoader?