Using DataParallel

Hi Everyone, I’m having a hard time getting DataParallel to work in my training script. The model is wrapped in DataParallel like this:

model = torch.nn.DataParallel(model, device_ids=device_ids).cuda()

where for experimentation purposes device_ids = [1]. In the training loop after the data is loaded it is sent to the gpu like this:

data = data.cuda()

When device_ids = [0] this is all fine but for any other setting I get an error saying

RuntimeError: module must have its parameters and buffers on device cuda:1 (device_ids[0]) but found one of them on device: cuda:0

I’ve tried various combinations of changing .cuda() to .to(device) where device is set in different ways but to no avail. I’ve looked at all the tutorials and documentation as well. I guess my confusion is when using a model wrapped in DataParallel where should the actual data be sent in the training loop?

Well one thing that seems to work is the following: suppose that device_ids = [1,2]. If I send both the model and the data to device ‘cuda:1’ then the training runs on both gpus. I’m not seeing the speedup I expected but at least it’s running. Also the utilization of gpu 2 is significantly lower than that of gpu 1. Is this correct? I.e., when using DataParallel do and where device is one of the gpus in device_ids? Or am I still missing something?

Well amend that. It runs but only on two gpus. The training machine has four gpus and when I try the above on 3 or 4 gpus it crashes the machine a few steps after training starts.