I encountered a problem that it seems the data movement operation takes long time in parallel forward, but I can’t find the reason. The main codes are as follow,
def main():
model = nn.DataParallel(MyModel())
model = model.cuda()
before_time = time.time()
logits = model(input)
class MyModel(nn.Module):
def forward(self, input):
start_time = time.time()
xxx
end_time = time.time()
return out
I use 4 V100 GPU in parallel, and num_workers set to 0.
The start_time - before_time takes 0.6s, and the actual forward time end_time-start_time only takes 0.15s. Data batch is (512 * 3 * 224 * 224).
I am curious that the actual forward time is only 0.15 seconds, but what operations takes 0.6 seconds during before_time and start_time? Is it the data movement operation? I don’t think that such few data would cost 0.6 seconds because 0.6s is too long for moving the 77M data.