I am trying to train my model on multiple GPUs but I have some trouble with torch.distributions.Laplace that I call in the forward pass.
I have uploaded a minimal working example that runs fine without torch.nn.DataParallel but fails when using it.
Is there any way to make this code run on multiple GPUs?
Is there any specific reason for using DataParallel instead of DistributedDataParallel? I have experience with only single GPU machines so I don’t know about the details here.
no particular reason, I have just seen more examples using DataParallel
But could be worth trying out if things look differently with DistributedDataParallel