Model memorizing patterns not generalizing - resnet 50 transfer learning

szymonindy · May 18, 2021, 7:22pm

Hello everyone, I am training my model so that It could recognize pneumonia and normal condition based on the following dataset.

I want to apply transfer learning to this problem. I am using resnet50 network.

model = models.resnet50(pretrained=True)
for param in model.parameters():
    param.requires_grad = False 

model.fc = nn.Sequential(
               nn.Linear(2048, 128),
               nn.ReLU(inplace=True),
               nn.Linear(128, 2)).to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.fc.parameters(), lr=learning_rate)

model.to(device)

**23,770,562 total parameters.**
**262,530 training parameters.**

Here is my training process:

# initialize the early_stopping object
model.eval()
early_stopping = pytorchtools.EarlyStopping(patience=patience, verbose=True)
for epoch in range(num_epochs):
    ##########################    
    #######TRAIN MODEL########
    ##########################
    epochs_loss=0
#     model.train()
    for i, (images, labels) in enumerate(train_dl):
        # Move tensors to the configured device
        model.train()
        images = images.to(device)
        labels = labels.to(device)
        # Forward pass
        outputs = model(images).to(device)
        loss = criterion(outputs, labels)
        
        # Backprpagation and optimization
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        
        #calculate train_loss
        train_losses.append(loss.item())
    
    ##########################    
    #####VALIDATE MODEL#######
    ##########################
    for images, labels in val_dl:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images).to(device)
        loss = criterion(outputs,labels)
        valid_losses.append(loss.item())
    
    # print training/validation statistics 
    # calculate average loss over an epoch
    train_loss = np.average(train_losses)
    valid_loss = np.average(valid_losses)
#     print(train_loss)
    avg_train_losses.append(train_loss)
    avg_valid_losses.append(valid_loss)
    
    print_msg = (f'train_loss: {train_loss:.5f} ' + f'valid_loss: {valid_loss:.5f}')
    
    print(print_msg)
    
    # clear lists to track next epoch
    train_losses = []
    valid_losses = []
    
    early_stopping(valid_loss, model)
    print(epoch)
        
    if early_stopping.early_stop:
        print("Early stopping")
        break

Here are the results. My model is memorizing this may be caused be overfitting but I have no idea how to fix it and what’s wrong with it.

train_loss: 0.59755 valid_loss: 0.82625
Validation loss decreased (inf → 0.826249). Saving model …
0
train_loss: 0.52524 valid_loss: 0.83933
EarlyStopping counter: 1 out of 5
1
train_loss: 0.48533 valid_loss: 0.89458
EarlyStopping counter: 2 out of 5
2
train_loss: 0.43887 valid_loss: 0.97882
EarlyStopping counter: 3 out of 5
3
train_loss: 0.40483 valid_loss: 1.03101
EarlyStopping counter: 4 out of 5
4
train_loss: 0.37320 valid_loss: 1.04973
EarlyStopping counter: 5 out of 5

I would be really grateful if you could suggest me what’s wrong with my code or my logic behind this experiment

github.com

szymonrucinski/x-ray/blob/master/pneumonia.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torchvision.models as models\n",
    "from pathlib import Path\n",
    "import os\n",
    "import glob\n",
    "import cv2\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from torch.utils.data import Dataset, random_split, DataLoader\n",
    "from PIL import Image\n",
    "import torchvision.models as models\n",

This file has been truncated. show original

AleAyotte · May 18, 2021, 10:00pm

To avoid overfitting, you could add a dropout layer before your first fully connected layer, and see if it help. If it do not help, you should try to unfreeze a couple of layers in your model, from my experience, it could help a lot to give your model enough capacity to learn deeper pattern that can be generalized.

pascal_notsawo · May 18, 2021, 10:32pm

Just a remark.
If it is a binary classification as the notebook suggests, why not just use an output of dimension 1 (nn.Linear(128, 1)) and use the binary cross-entropy with logits loss.