Loading fine-tuned model

_joker · August 4, 2020, 7:08pm

I am using pre-trained resnet for fine-tuning it to adversarial examples in this way;

model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, NUM_CLASSES)

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=100)
model_ft.save_state_dict('fine_tuned_best_model.pt')

My confusion is, when I load my model fine_tuned_best_model.pt back for testing, should I just give the resnet18 definition like this;

classifier = resnet18()
model_para = torch.load(CLASSIFIER_PATH)
classifier.load_state_dict(model_para['net'])
classifier = nn.DataParallel(classifier) if torch.cuda.device_count() > 1 else classifier
classifier.to(device)

or do I need that last layer which I added for training?

ptrblck · August 5, 2020, 10:21am

You would need to restore the model in the same way, including the last layer change, to be able to properly load the state_dict. Otherwise you’ll get a shape mismatch for this layer in load_state_dict.

_joker · August 5, 2020, 4:02pm

Thanks @ptrblck Can you show me a code snippet. As I have a pre-trained resnet for CIFAR10 and I am using that to load for training with the same resnet18 definition as torchvision.models.resnet18. My model doesn’t learn anything and has only 10% accuracy.

But when I print the model, all the biases=False except for the last fc layer.
Now when I try training the whole pre-trained model over my new data which is adversary of CIFAR10, it gives me 76% test accuracy but the original accuracy on CIFAR10 decreases from 93.33% to 87%

I am not sure where am I going wrong and I am unsure which one of the above methods would be good for me if I want my model to be trained for both the data (CIFAR10 and adversary of CIFAR10).

ptrblck · August 5, 2020, 7:53pm

I’m not sure what code snippet you are looking for.

I would generally recommend to try to overfit a small data sample as a quick test to check for potential bugs in the code.

This is expected, as all bias parameters in the conv layers are deactivated, since they are followed by batchnorm layers and thus scaled.

How was the original accuracy on CIFAR10 calculated, if you can only achieve 10%?

vsant · August 2, 2021, 9:56pm

Hello,

Regarding this point, as per SAVING AND LOADING MODELS:

If you only plan to keep the best performing model (according to the acquired validation loss), … You must serialize best_model_state or use best_model_state = deepcopy(model.state_dict()) otherwise your best best_model_state will keep getting updated by the subsequent training iterations. As a result, the final model state will be the state of the overfitted model.

However, I have done something like this:

def train_model(model, ...):
       ...
       if validation_loss improves:
            delete previous best model
            torch.save(model.state_dict(), best_model_path)
       else:
             ....
        ...
        return model

 def test_model(model, best_model_path, ...):
     model.load_state_dict(torch.load(best_model_path))
     model.eval()
     ...

...

my_model = train_model(my_model, ...)

test_model(my_model, my_path, ...)

In other words, the model returned by the training phase is the final one which is likely to present overfitting. But since I saved the best model during training, I have no problem during the test/inference phase because I load it during testing.

Is something wrong with this solution?

Thanks.