[Dataparallel] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

C_J · April 17, 2022, 9:02am

Hi,
I’m trying to use dataparallel, and I encounter this runtime error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

I use some custom parameters in my model (some are training parameters, some are constant),
and when I initialize the module, I save some parameters in gpu using .cuda, while some are initialized using torch.nn.Parameter. The model is written before I try to use dataparallel. SO I suspect that when I wrap my model, it gets confused that some of the parameters are located in a fixed gpu.

Can I get some advice on how I should appropriately initialize my model to use dataparallel ?

Thanks for reading and have a nice day!

InnovArul · April 17, 2022, 9:11am

Try not to use cuda() and let the DataParallel handle movement of tensors/params on its own.
If the issue still persists, pl. try to share a code snippet to debug and point out the issues if any.