If you have a DistributedDataParallel model ddp_model, you can try to save the model parameters via sd = ddp_model.module.state_dict(). (Note the extra .module.) This saves the model parameters as if they were from a local non-DDP-version of the model. In that case, you should be able to load the model parameters via local_model.load_state_dict(sd) when running on a single CPU process / single GPU and without initializing a process group.
@agu I had revised my state_dict by using “model_ddp.module.state_dict()” and then used torch.save(dict_model,‘‘weight_ddp.pth’’). When I run my inference.py. There was still RuntimeError at the line of
==> “checkpoint = torch.load(checkpoint_file)”
checkpoint_file = ‘weight_ddp.pth’
checkpoint = torch.load(checkpoint_file) # RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
model = checkpoint[‘model’]
model.load_state_dict(checkpoint[‘state_dict’])
model.eval()