I’m currently using dataparallel (not DistributedDataParallel yet, though it’s my ultimate goal).
I’m using 3 gpus, but it’s suspicious if the process is actually done in parallel.
model()
model=DataParallel(model)
model.cuda()
Inside the model:
def forward(self, input):
output=self.custom_function(input)
print('Job done in Device:',input.device)
return output
when I run model.forward path, as a result, I get below printed:
Job done in Device: cuda:0
Job done in Device: cuda:1
Job done in Device: cuda:2
The problem is, I expect those 3 lines to be printed at almost same time (since the process is parallel)
But, each of them is printed with fairly long time gap (>1sec), which suggests that they are not actually processed in parallel. FYI, the whole process should take ~1 sec.
I’m suspecting if I am using cpu somehow during the forward pass, but I am not sure if it’s even possible after wrapping the model with DataParallel.
Or Is there any way to check if the process are done in parallel?
Thanks for reading!