Data Parallel isn't working on submodules

I get a device mismatch error when using Data-Parallel for training with multiple GPUs.

After debugging I got to know that the data-parallel module doesn’t work with submodules.The model essentially is inceptionnet_v1/googlenet. My model design template is as below:

Class Submodule(nn.Module):

def __init__():

      super .......

      self.a = nn.Seq(.......)

      self.b = nn.Seq(.......)

   ..........

 def forward(self,x):

     return torch.cat([self.a(x)......])

Class Model(nn.Module):

 def __init__():

      super .......

      self.conv = conv

      self.sublayer1 = Submodule(....)

      self.sublayer2 = Submodule(....)

      .......

def forward(self,x):

     x= self.conv(x)

     x = self.sublayer1(x)

     x = self.sublayer2 (x)

     ........

return x

model = Model()

model = nn.DataParallel(model).cuda()

The DataParallel Works with Conv layers and not sublayers. It would be great if anyone can give me a pointer to debug this. I couldn’t find support from previous Pytorch Forums.

It does work with submodules.
The problem is you probably have some hardcoded device in the submodules.
This is calling .cuda() instead of coding it as .to(device) being device based on forward’s input.

I tried changing it to .to(device) and have the same error. The problem here is I’m trying to work with DataParallel method and I think this method is not used by the submodule.

Can you post the model code?
The submodule has to be a nn.Module, a nn.modulelist or torch dict.
Any other object is not properly traced by the system.