How to allocate the class variable with DataParallel?

Bohan_Zhuang · July 6, 2018, 11:13am

I tried to write a piece of code as follows:

class Test(nn.Module):
    def __init__(self): 
        super(Test, self).__init__()
        self._coeffs = Variable(1e-3*torch.randn(3).cuda(), requires_grad=True) 
        self. convs = nn.ModuleList([nn.Conv2d(10, 10, 3) for i in range(3)])
    def forward(x):
        sum_output = None
        for i in range(3):
            sum_output += self._coeffs[i] * self.convs[i](x)
        return sum_output

And I put the model on 4-gpus with Dataparallel class:

model = nn.DataParallel(Test, [0,1,2,3]).cuda()

During the forward pass, it reports “RuntimeError: tensors are on different GPUs…” and I’m very sure the problem is at the self._coeffs. Wait for answers…

ptrblck · July 6, 2018, 11:18am

You shouldn’t create CUDA tensors in your init on a specific device.
Currently you are pushing self._coeffs to the default GPU.

Try to remove the .cuda() op and run it again.

PS: You can add code using three backticks `.

Bohan_Zhuang · July 9, 2018, 1:38am

Hi ptrblck,

Thanks for your quick answer and it works well now. But I found that nn.DataParallel doesn’t replicate module’s member variables. Specifically, nn.Dataparallel does not work on self._coeffs since I checked that all the forward and backward operations with respect to self._coeffs are on cuda:0 .