How to finetune models trained by distributed data parallel(ddp)

kkjh0723 · June 24, 2019, 10:42am

I want to finetuning model by replacing only the last linear layer.
I could do that when I used DataParallel module as below

model = nn.DataParallel(model)

...
model.load_state_dict(checkpoint['state_dict'])
...    

model.module.fc = torch.nn.Linear(model.module.fc.in_features,
                                    opt.n_finetune_classes)
model.module.fc = model.module.fc.cuda()

In case of DDP, how can I do that?
according to the warning in the document, it seems that I cannot change parameters after I load the checkpoint.

This module assumes all parameters are registered in the model by the time it is created. No parameters should be added nor removed later. Same applies to buffers.

Should I change the linear layer first and load the parameters without the linear layer?
Is that a right way?

pietern · June 24, 2019, 11:12am

Yes, try modifying the module first, and once you’re done, wrapping it in nn.DataParallel.

kkjh0723 · June 25, 2019, 1:43am

Thanks @pietern. How can I load some part of parameters after wrapping with nn.parallel.DistributedDataParallel?

pietern · June 25, 2019, 7:27am

It is easier to load the parameters prior to wrapping with DDP. You can save/load the wrapped model as well, but then you can no longer use it without DDP.