Model memorizing patterns not generalizing - resnet 50 transfer learning

Hello everyone, I am training my model so that It could recognize pneumonia and normal condition based on the following dataset.

I want to apply transfer learning to this problem. I am using resnet50 network.

model = models.resnet50(pretrained=True)
for param in model.parameters():
    param.requires_grad = False 

model.fc = nn.Sequential(
               nn.Linear(2048, 128),
               nn.ReLU(inplace=True),
               nn.Linear(128, 2)).to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.fc.parameters(), lr=learning_rate)

model.to(device)

**23,770,562 total parameters.**
**262,530 training parameters.**

Here is my training process:

# initialize the early_stopping object
model.eval()
early_stopping = pytorchtools.EarlyStopping(patience=patience, verbose=True)
for epoch in range(num_epochs):
    ##########################    
    #######TRAIN MODEL########
    ##########################
    epochs_loss=0
#     model.train()
    for i, (images, labels) in enumerate(train_dl):
        # Move tensors to the configured device
        model.train()
        images = images.to(device)
        labels = labels.to(device)
        # Forward pass
        outputs = model(images).to(device)
        loss = criterion(outputs, labels)
        
        # Backprpagation and optimization
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        
        #calculate train_loss
        train_losses.append(loss.item())
    
    ##########################    
    #####VALIDATE MODEL#######
    ##########################
    for images, labels in val_dl:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images).to(device)
        loss = criterion(outputs,labels)
        valid_losses.append(loss.item())
    
    # print training/validation statistics 
    # calculate average loss over an epoch
    train_loss = np.average(train_losses)
    valid_loss = np.average(valid_losses)
#     print(train_loss)
    avg_train_losses.append(train_loss)
    avg_valid_losses.append(valid_loss)
    
    print_msg = (f'train_loss: {train_loss:.5f} ' + f'valid_loss: {valid_loss:.5f}')
    
    print(print_msg)
    
    # clear lists to track next epoch
    train_losses = []
    valid_losses = []
    
    early_stopping(valid_loss, model)
    print(epoch)
        
    if early_stopping.early_stop:
        print("Early stopping")
        break

Here are the results. My model is memorizing this may be caused be overfitting but I have no idea how to fix it and what’s wrong with it.

train_loss: 0.59755 valid_loss: 0.82625
Validation loss decreased (inf → 0.826249). Saving model …
0
train_loss: 0.52524 valid_loss: 0.83933
EarlyStopping counter: 1 out of 5
1
train_loss: 0.48533 valid_loss: 0.89458
EarlyStopping counter: 2 out of 5
2
train_loss: 0.43887 valid_loss: 0.97882
EarlyStopping counter: 3 out of 5
3
train_loss: 0.40483 valid_loss: 1.03101
EarlyStopping counter: 4 out of 5
4
train_loss: 0.37320 valid_loss: 1.04973
EarlyStopping counter: 5 out of 5

I would be really grateful if you could suggest me what’s wrong with my code or my logic behind this experiment :smiley:

To avoid overfitting, you could add a dropout layer before your first fully connected layer, and see if it help. If it do not help, you should try to unfreeze a couple of layers in your model, from my experience, it could help a lot to give your model enough capacity to learn deeper pattern that can be generalized.

Just a remark.
If it is a binary classification as the notebook suggests, why not just use an output of dimension 1 (nn.Linear(128, 1)) and use the binary cross-entropy with logits loss.