How to finetune models trained by distributed data parallel(ddp)

I want to finetuning model by replacing only the last linear layer.
I could do that when I used DataParallel module as below

model = nn.DataParallel(model)

...
model.load_state_dict(checkpoint['state_dict'])
...    

model.module.fc = torch.nn.Linear(model.module.fc.in_features,
                                    opt.n_finetune_classes)
model.module.fc = model.module.fc.cuda()

In case of DDP, how can I do that?
according to the warning in the document, it seems that I cannot change parameters after I load the checkpoint.

This module assumes all parameters are registered in the model by the time it is created. No parameters should be added nor removed later. Same applies to buffers.

Should I change the linear layer first and load the parameters without the linear layer?
Is that a right way?

Yes, try modifying the module first, and once you’re done, wrapping it in nn.DataParallel.

Thanks @pietern. How can I load some part of parameters after wrapping with nn.parallel.DistributedDataParallel?

It is easier to load the parameters prior to wrapping with DDP. You can save/load the wrapped model as well, but then you can no longer use it without DDP.