DataParallel on large network that take up more than one GPU

JoeHEZHAO · March 5, 2019, 7:49pm

Hello everyone

I would like to do DataParallel onto a self-defined network. However, my model is a bit large so that it takes more than one GPUs (I have 4 in total). Like below:

class net(nn.Module):
    def __init__():
        self.module1 = ... # take GPU 0;
        self.module2 = ... # take GPU 3;
    def forward(self, input):
        x = self.module1(input)
        return self.module2(x)

Now, I want to do DataParallel on my input. Whereas, it reports error as “Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’; but device 1 does not equal 0”. I am assuming this error is caused by that my model takes two GPU ?

Please correct me and let me know what should I do for my purpose.

Best

JuanFMontesinos · March 5, 2019, 8:18pm

You cannot do that.
The closest thing you can do is to assign data parallel between 2 gpus for module 1 and other 2 for module 2.

If you hard code gpus inside the nn.module you cannot use data parallel

JoeHEZHAO · March 5, 2019, 8:34pm

Thanks. Very helpful.

Actually, I did not hard code gpu devices to each sub-module. I just apply do
torch.nn.DataParallel(net, device_ids=[0, 1, 2, 3]).cuda(). I check module1.weight and module2.weight and it shows cuda:0 and cuda:3 respectively. (This never happen before).

I am suspecting something wrong here. I doubt that a BNInception(module1) + two residual block(module2) would go over 12GB ?

JuanFMontesinos · March 5, 2019, 8:44pm

I’m afraid that if you used dataparallel with 4 gpus, everything should be allocated in each of them. Is it possible you to be wrongly coding that?

Anyway, what’s the point of applaying dataparallel to each submodule among all gpus? If you can do that, you can probably apply dataparallel to the parent modulke

JoeHEZHAO · March 5, 2019, 8:49pm

I agree. DataParallel assume that each GPU can take at least one data sample on its own.

I did apply dataparallel to the parent module, without specifying any gpu to any module… I found out module2 is assigned to cuda:3 after encourter the error…