Pretained model vs Resume training

Mdhvince · May 29, 2019, 12:05pm

Hi,
I have some difficulties to understand when to use resume training or pretrained models.
Imagine I have already trained my model on some data (everyday position) to do Human Body coordinates Detections. And one day, I want to train it on some new data (in sports position) and for the same task in order to learn more from sport position.
Can I just consider my model as pretrained model and just do:

pre_net = torch.load(PATH_MODEL, map_location=device)
net.load_state_dict(pre_net)

for param in list(net.parameters()):
     param.requires_grad = True

And then train the model on new data (without loosing what the model has already lerned) ???

OR

Train my model from 0, Save it the right way (model, optimizer, loss, epoch …) and perform Resume Training?

Thank you!

JuanFMontesinos · May 29, 2019, 1:23pm

Pretrained or resume is nothing but a convention. When you talk about pretrained, you usually refer to weights for an specific architecture trained on an specific dataset such that results are the best possible for that dataset/architecture.

That’s why you talk about something pretrained on imagenet. It means your starting point is best weights for your architecture based on imagenet dataset.

On contrary, resume refers to the stage of training an architecture. While you are training it, it does not perform the best way possible. If you stop that training you need to resume, but you don’t consider you are training from pretrained as the state (weights you are taking) are not the best possible and it still needs to be trained. Another think you may think about is that when you train a model it usually takes iterations/epochs to be trained. If you want to transmit to another person how many time/iterations etcetera took you to train an architecture, you need to talk about from where did you start. If you start from a more-or-less trained network, that does not transmit how good the performance was. If you say that you started from some network pretained on imagenet, people know what the performance is at the beggining and which weights you exactly used as it is an standard and it is reproducible.

Mdhvince · May 29, 2019, 2:15pm

Thank you for your reactivity @JuanFMontesinos JuanFMontesinos. I have a better understanding now. In my case, I assume that the model is at its best performance, it was trained on a big dataset for body joints estimation. However, when I’m doing Inference on my own very specialized data (with weird humans body poses), it doesn’t do well.
So, if I’ve understood what you said, I should load my model as a pre-trained model and redo training on my own data?

Can you confirm?
Now if yes, should I set requires_grad = True on the parameters of my model ?

JuanFMontesinos · May 29, 2019, 2:42pm

Yep, the action of retraining a network which has been already (fully) trained is called finetunning. The proper way of explaining other what you did is saying you got X network pretrained on Y dataset and finetuned for W.

you have to set requires_grad = True but finetunning usually requires a small learning rate as it is considered pretrained weights are already closed to optimal ones.

Mdhvince · May 29, 2019, 6:18pm

Hey @JuanFMontesinos Thanks a lot, I really appreciated all of your answers. One last question if I can, What will happen if I do not set requires_grad = True?

JuanFMontesinos · May 29, 2019, 8:43pm

Well, those parameters whose rg=False won’t be updated, like frozen weights

Mdhvince · May 30, 2019, 9:15am

Thanks for all @JuanFMontesinos