Finetuning a model on single-gpu if it was trained on multi-gpu using DistributedDataParallel

ishitamed · October 22, 2020, 8:50am

Hi!
I have a doubt that is if I train a model using multi-GPU, DistributedDataParallel training (say the resnet50 training on imagenet example). Later on, for some feature extraction task, if I want to use its frozen weights as it is, or just want to finetune some of its last layers, but on a single GPU with no Dataparallel or DistributedDataParallel - is that possible (if yes, then just torch.load() will work fine?)? Will there be an issue?

TIA

ptrblck · October 23, 2020, 10:54am

Yes, that’s possible and I would recommend to store the state_dict via:

torch.save(model.module.state_dict(), path)

as described here and here.