i implemented an architecture that handles multiple inputs, each being processed by its own encoder. In order to speed things up, I want to train my model on multiple gpus. This is my code:
def forward(self, x): ''' x: list of input tensors ''' h = list() for i, x_i in enumerate(x): h_i = self.encoders[i](x_i) h.append(h_i) z = torch.cat(h, dim=0) y_pred = self.classifier(z)
If I would simply use data_parallel class here, each encoder would get copied to each GPU, however I think it would be faster, if each encoder would be trained on its own GPU. Is there any possibility to achieve this?