Finetuning a model on single-gpu if it was trained on multi-gpu using DistributedDataParallel

I have a doubt that is if I train a model using multi-GPU, DistributedDataParallel training (say the resnet50 training on imagenet example). Later on, for some feature extraction task, if I want to use its frozen weights as it is, or just want to finetune some of its last layers, but on a single GPU with no Dataparallel or DistributedDataParallel - is that possible (if yes, then just torch.load() will work fine?)? Will there be an issue?


Yes, that’s possible and I would recommend to store the state_dict via:, path)

as described here and here.