Hi Everyone, I’m having a hard time getting DataParallel to work in my training script. The model is wrapped in DataParallel like this:
model = torch.nn.DataParallel(model, device_ids=device_ids).cuda()
where for experimentation purposes device_ids = [1]. In the training loop after the data is loaded it is sent to the gpu like this:
data = data.cuda()
When device_ids = [0] this is all fine but for any other setting I get an error saying
RuntimeError: module must have its parameters and buffers on device cuda:1 (device_ids[0]) but found one of them on device: cuda:0
I’ve tried various combinations of changing .cuda() to .to(device) where device is set in different ways but to no avail. I’ve looked at all the tutorials and documentation as well. I guess my confusion is when using a model wrapped in DataParallel where should the actual data be sent in the training loop?