DataParallel optim and saving correctness

Are the two marked lines 1) and 2) equivalent when using DataParallel?

net = torch.nn.DataParallel(net).cuda()
optim = torch.optim.Adam(net.parameters(), LR)         # 1)
optim = torch.optim.Adam(net.module.parameters(), LR)  # 2)

Or does one of them create weird synchronization issues?


Also, while saving params to file, which one is preferred when using a DataParallel module?

net = torch.nn.DataParallel(net).cuda()
...
torch.save({
  'epoch': epoch,
  'args': args,
  'state_dict': model.state_dict(),  # 1) OR
  'state_dict': model.module.state_dict(),  # 2)
  'loss_history': loss_history,
}, model_save_filename)

In Torch, an analogue of method 2) was preferred (https://github.com/facebook/fb.resnet.torch/blob/master/checkpoints.lua#L45-L48).

3 Likes

#1 is preferred in both cases. Unlike (Lua)Torch you dont need the workarounds.

I tried #1 however, I run into this issue while loading the state_dict back:
KeyError: 'unexpected key "module.cnn.0.weight" in state_dict'
Essentially, the model is nested in the module of DataParallel.
I guess it might be better to use #2 for saving?

2 Likes

oh i see. yea probably #2 for saving via state_dict.

4 Likes