How can I load a non-DDP checkpoint model into a DDP model?

Should doing:


be sufficient? or does this have an unknown bug

Edit: This does not work.

I would recommend to finish the model setup (i.e. in this particular use case loading the state_dict) before wrapping it into DDP, since the same parameter set can then be scattered to all GPUs.
If you delay it and use the internal .module attribute you would need to make sure all models have the same parameter set on all ranks.