CUDA vs. DataParallel: Why the difference?

I have a simple neural network model and I apply either cuda() or DataParallel() on the model like following.

model = torch.nn.DataParallel(model).cuda()

OR,

model = model.cuda()

When I don’t use DataParallel, rather simply transform my model to cuda(), I need to explicitly convert the batch inputs to cuda() and then give it to the model, otherwise it returns error. [torch.index_select received an invalid combination of arguments - got (torch.cuda.FloatTensor, int, torch.LongTensor)]

But with DataParallel, the code works fine. Rest of the other things are same. Why this happens? Why when I use DataParallel, I don’t need to transform the batch inputs explicitly to cuda()?

3 Likes

DataParallel allows CPU inputs, as it’s first step is to transfer inputs to appropriate GPUs. that’s simply the reason why.

1 Like

@smth Sorry to revive an old thread but I’m looking for a bit of clarification. Are you saying that if we pass our model to DataParallel then we don’t need to explicitly call .cuda() on our Variables (e.g. inputs.cuda(cuda_device))? I was under the impression that the intent of DataParallel was to be used when you were utilizing multiple GPUs. Am I misunderstanding?

Thanks

@achaiah if you use DataParallel, and the input is on CPU, then the input chunks are parallely copied over to each GPU…