How to transfer single GPU resume to dataparallel mode multi-GPU in train stage with Pytorch

When I train a speech enhancement model with single GPU on the pytorch framework, Training stage is not finish. I stoped it because the speed is a little slow, Now I want to use data-parallel mode multiply GPU to continue training from resume model, what I should do?

You can refer to the example code in this reply.

First, thanks for your reply. Because single GPU training model, its state dict is different from Mutilply GPU training model. so it can’t load state dict from single GPU resume model on the multiply GPU training model.

Hi @duo_ma, sorry, I should have explained in more detail.

DataParallel or DistributedDataParallel models calls it state_dict as self.module.xxx while single-gpu models call their state_dict without self.module element.

What I would recommend in general is always saving state_dict without .module. and loading it to model itself (single-gpu) or model.module (multi-gpu) accordingly.

# saving
if isinstance(model, (nn.DataParallel, nn.DistributedDataParallel)):
    torch.save(model.module.state_dict(), model_save_name)
else:
    torch.save(model.state_dict(), model_save_name)
# loading
model = nn.DataParallel(model, **gpu_device_arg) # multi-gpu model
if isinstance(model, (nn.DataParallel, nn.DistributedDataParallel)):
    model.module.load_state_dict(state_dict)    # your model will be loaded to multi-gpu model.
else:
    model.load_state_dict(state_dict)

In a case when you already saved multi-gpu model parameters as .module.xxx and loading to a single-gpu model, then you should do:

model = nn.DataParallel(model, **gpu_device_arg) # make it multi-gpu
model.load_state_dict(state_dict)

model = model.module  # make it single-gpu