Hello dear pytorch community,
I have two tasks that rely on a binary outcome that I would like to train.
The model I am using for that is a normalization free ResNet-26 structure.
Task #1 is fairly easy, it is simple to learn and after training for a couple of epochs, the accuracy and loss are all fairly reasonable. Lets say that Task#1 is to differentiate between A and B.
Task#2 however is much harder. Even after 50 epochs of training, the test accuracies dont surpass 70% and overall the model is overfitting. (i.e, 98% accuracy in training and 70% accuracy in test).
Task #2 is to differentiate between A and C.
Since both tasks use A
as an input, I thought it would be a good idea to just train a model on task 1 and then use the pretrained structure to test it on task 2. Since both try to classify between A and some other input.
Now here is the point where I am a bit lost. When I do load the pretrained model on task #1 and test it, it does not succeed in task #2, meaning chance level accuracies. A colleague told me that I might need to fine tune the model first.
So I did some googling and I now have the following setup.
I fine tune the pretrained model for 30 epochs using this code
fine_tune_epochs = 30
optimizer = optim.Adam(model.parameters(), lr=0.0001)
for epoch in range(fine_tune_epochs):
model.train()
running_loss = 0.0
train_correct = 0
train_total = 0
for inputs, labels in tqdm(train_loader, desc="Testing", leave=False):
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
_, predicted = torch.max(outputs.data, 1)
train_total += labels.size(0)
train_correct += (predicted == labels).sum().item()
print(f"Epoch [{epoch+1}/{fine_tune_epochs}], Loss: {running_loss/len(train_loader):.4f}, Accuracy: {100 * train_correct / train_total:.2f}%")
This actually gives me training accuracies of around 78% and more importantly, when testing I actually get 75 % accuracy in the testset.
To me this seems like there is of course some transfer learning happening, because I do need way less training time and also less data to achieve superior accuracies (and loss) then when I would just try to train the task with a model itself.
But I am questioning, if I am merely just re-training the model itself. Im not sure if this set-up makes too much sense, results aside.
Furthermore, I also did try to freeze the fully-connected layer but this actually just worsened the performance. Is there maybe a way to just re-train the layers that are associated with input B?
As in, leave the layers that have learned the representations of input A and just re-train the layers that are associated with the other task…?
I would appreciate any help and feeback a lot. Please let me know if I can provide more information to paint a clearer picture.
Main questions are: Is this set-up Im using reasonable at all, and if yes/or no, how could I utilize more sophisticated freezing strategies.
All the best