Hi, I have a question about using multiple GPU devices.
I set my model to use multiple device as below.
os.environ["CUDA_VISIBLE_DEVICES"]="0,1,2,3"
model = model().to(device)
model = nn.DataParallel(model.to(device))
(...)
def train():
(....)
for epoch in range(start_epoch, 10001):
(....)
for ..... in training_generator:
(.......)
output=model(user_input)
My model is 3D-CNN and my ‘user_input’ has (A,B,C,D,E) dimension. At here, if I set A =4 , does my model run with 4 devices parallely? If I use 4 devices, does the number A have to be a multiple of 4 for efficient calculation?
Thanks.
No, each device will get a batch containing 2 samples.
DDP should be faster as it reduces the communication overhead in nn.DataParallel.
The details of the latter (including the scatter/gather calls) are described in this blog post.