Ask about run my model with Parallel device

Hyeonuk_Woo · June 15, 2021, 9:57am

Hi, I have a question about using multiple GPU devices.

I set my model to use multiple device as below.

os.environ["CUDA_VISIBLE_DEVICES"]="0,1,2,3"
model = model().to(device)
model = nn.DataParallel(model.to(device))
(...)
def train():
      (....)
      for epoch in range(start_epoch, 10001):
           (....)
           for ..... in training_generator:
                (.......)
                output=model(user_input)

My model is 3D-CNN and my ‘user_input’ has (A,B,C,D,E) dimension. At here, if I set A =4 , does my model run with 4 devices parallely? If I use 4 devices, does the number A have to be a multiple of 4 for efficient calculation?
Thanks.

ptrblck · June 15, 2021, 7:19pm

Yes, nn.DataParallel would split the input batch in dim0 and transfer each chunk to the corresponding device.

Also yes, but to get the most efficient approach you should use DisttributedDataParallel with a single process per GPU.

Hyeonuk_Woo · June 16, 2021, 12:54am

Frist of all, thankyou so much!

I have one more basic question.

If I use 4 devices and dim0 of my input data is 8, in this case, each GPU handle two process simultaneously or sequentially?

Why single process per GPU is recommended?

+) the word ‘single process’ means that batch size of the data assigned to single GPU is 1?

ptrblck · June 16, 2021, 5:22am

No, each device will get a batch containing 2 samples.

DDP should be faster as it reduces the communication overhead in nn.DataParallel.
The details of the latter (including the scatter/gather calls) are described in this blog post.