Save/reload torchvision model after I changed its inner structure

evgeniititov · December 3, 2019, 12:24pm

Hello folks,

I am pretty sure this question might have been asked multiple times, but I failed to find answer to my question, so here we go.

Long story short, I use pretrained ResNet18 to classify some custom objects of mine. I changed its classifier from the one that predicts 1k classes to just 2. Trained the mode with transfer learning (didnt freeze any layers), saved its weights correctly, no dramas.

My question is, how do I load this model? I mean I do know how to load it, but during training I change default resnet18’s classifier, so I cannot just import the default model from torchvision in another script I’ve got, load my weights for 2 classes and hope it will work, right? Does it mean that I need to somehow save the model as well so that I could open it in another script and start using?

Also, I suppose i can relatively easy find it online but still I am here asking questions, I want to ask one more question as well. I want to train a model to classify defected (with cracks etc) and not defected concrete poles. I’ve got relatively big dataset of concrete and pavement with cracks that should get the job done, but I still want to perform data augmentation. The question is can it backfire if you include too many image modifications? Like all those flips, angle changes, colour changes etc. Should I be careful and do not include too many? For this problem I am going to test the most popular model like VGG, inception etc. If you could also give any advice what you think might work for this task I’d be very grateful.
What transfer learning approach would you suggest? Since the problem is quite straightforward, I was thinking to unfreeze last conv layers during training so that a model better adapts to the shapes it is to work with. Do you think it is correct I should just training a new classifier without unfreezing the layers?

Thank you very much in advance guys. I do hope I managed to properly word my questions.

Regards,
Eugene

ptrblck · December 3, 2019, 2:58pm

Use the same workflow as was done before training. I.e. create the model, change the last layer, and load the state_dict. The most important part is to save the source code somehow to be able to get the same architecture, since the state_dict only stores the parameters and buffers without any information about e.g. the forward pass.
Use the validation data to check how much augmentation is necessary and when it hurts the model performance.
Same as above. As long as you don’t touch the test set for these experiments, you should be fine.

evgeniititov · December 3, 2019, 10:20pm

Hi ptrblck,

Thanks a lot for your answer. Very helpful.

All the best

Eugene