What will parallel model do when calling the forward function

I encountered a problem that it seems the data movement operation takes long time in parallel forward, but I can’t find the reason. The main codes are as follow,

def main():
    model = nn.DataParallel(MyModel())
    model = model.cuda()

    before_time = time.time()
    logits = model(input)


class MyModel(nn.Module):
   def forward(self, input):
      start_time = time.time()
      xxx
      end_time = time.time()
      return out

I use 4 V100 GPU in parallel, and num_workers set to 0.

The start_time - before_time takes 0.6s, and the actual forward time end_time-start_time only takes 0.15s. Data batch is (512 * 3 * 224 * 224).

I am curious that the actual forward time is only 0.15 seconds, but what operations takes 0.6 seconds during before_time and start_time? Is it the data movement operation? I don’t think that such few data would cost 0.6 seconds because 0.6s is too long for moving the 77M data.:thinking:

I suspect that calling model.cuda() on nn.DataParallel is causing trouble. Can you try creating the model first, then calling model.cuda(), and then wrapping it in nn.DataParallel? I think that in this example the entire model is copied to GPU every time you call forward.

Thank you! I will try it and then reply. The original style is inspired from DARTS.