Feature extraction + fine tuning in resnet50

Hi! I’m trying to train resnet50 for binary classification [in a very small dataset (600 MRI images)]. I read this tutorial and I’ve tried very different configurations of learning rate, and custom fc layer. The code bellow is the configuration that gets best results so far.

When doing Feature Extraction with custom FC layer the model gets 75% acc max.
When doing Fine Tuning with custom FC layer the model gets 85% acc max.
When doing FE and then FT It gets 90% max.

I want to try Feature Extraction followed by Fine Tuning, maybe that approach could get better results. I have done the following:

#  Load the model

model = models.resnet50(pretrained=True, progress=True)

#  Freeze all parameters in the base model

for param in model.parameters():
   param.requires_grad = False

# Replace FC layer with custom layer

model.fc = nn.Sequential(nn.Dropout(),
                         nn.Linear(model.fc.in_features, 1),
                         nn.Sigmoid())

# Train, optimize only FC layer
model.train()
criterion = nn.BCELoss()
optimizer = torch.optim.SGD(self.model.fc.parameters(), lr=self.learning_rate)

# Eval and save best model
model.eval()
torch.save(self.model.state_dict(), 'state.pth')

In the resulting model I perform fine tuning. I’ve try this:

# Load state dict of last model
model = self.model
            model.load_state_dict(torch.load(state))
            model.to(device)

# Freze parameters only in fully conected layer
for param in model.fc.parameters():
    param.requires_grad = False
for param in model.parameters():
    if param != model.fc.parameters():
    param.requires_grad = True

# Train, optimize all layers except FC
criterion = nn.BCELoss()
base_parameters = [self.model.conv1.parameters(),
                   self.model.bn1.parameters(),
                   self.model.layer1.parameters(),
                   self.model.layer2.parameters(),
                   self.model.layer3.parameters(),
                   self.model.layer4.parameters()]
optimizer = torch.optim.SGD(itertools.chain(*base_parameters), lr=0.01)

I have some questions in my mind:

  1. Am I right when I think that FE → FT it’s a better approach but I’m not doing it right?
  2. How can I call parameters in groups? (Base model parameters and Fully connected layer parameters)
  3. What should I try to improve my model?

Thanks in advance for the help!