Are the two marked lines 1)
and 2)
equivalent when using DataParallel?
net = torch.nn.DataParallel(net).cuda()
optim = torch.optim.Adam(net.parameters(), LR) # 1)
optim = torch.optim.Adam(net.module.parameters(), LR) # 2)
Or does one of them create weird synchronization issues?
Also, while saving params to file, which one is preferred when using a DataParallel module?
net = torch.nn.DataParallel(net).cuda()
...
torch.save({
'epoch': epoch,
'args': args,
'state_dict': model.state_dict(), # 1) OR
'state_dict': model.module.state_dict(), # 2)
'loss_history': loss_history,
}, model_save_filename)
In Torch, an analogue of method 2) was preferred (https://github.com/facebook/fb.resnet.torch/blob/master/checkpoints.lua#L45-L48).