Multi-GPUs Syntax

Hi community.

I have some doubts on the steps required to make sure I am doing the training using the GPUs avalible.

After some checks on the available resources:

cuda = torch.cuda.is_available()
n_workers = multiprocessing.cpu_count()
device = 'cuda' if torch.cuda.is_available() else 'cpu'

print('Cuda: ', str(cuda))
print('Device: ', str(device))
print('Cores: ', str(n_workers))
print('GPUs available', str(torch.cuda.device_count()))

Output:

Cuda:  True
Device:  cuda
Cores:  24
GPUs available 8

Now, I can move both the data and the model to the avaible GPUs:
Before training, allocate the model:

model.train()
model.to(device)

During training, allocate the tensors:

images = images.to(device)
labels = labels.to(device)

My question now is, when do I need nn.Parallell and what funtionalities it is adding beyond the ones that I have already applied with device?

Thanks in advance,
Pablo

1 Like

Since you have multiple GPUs, you could use nn.DataParallel to utilize all or some of them.
Have a look at this tutorial to apply it.
Basically your batch will be split into chunks in the batch dimension and pushed to all specified devices.

Also to speed up the data loading you should use multiprocessing in your DataLoader by setting num_workers>0.

Thanks a lot!

From the tutorial, I understand that I need to Parallelize my model before moving it to device.
Since if I move it model.to(device) it will by default copy it to just 1 GPU regardless how many GPUs are available in torch.cuda.device_count(). Is this right?

I am using the dataloader as follows. Is the implementation correct?

train_loader = DataLoader(dataset = train_set.dataset, 
                               sampler=SubsetRandomSampler(train_set.indices),
                               batch_size = batch_size, num_workers=n_workers)

valid_loader = DataLoader(dataset = valid_set.dataset, 
                               sampler=SubsetRandomSampler(valid_set.indices),
                               batch_size = batch_size, num_workers=n_workers)

test_loader = DataLoader(dataset = test_set, batch_size = 1,
                               shuffle = False, num_workers=n_workers)

Thanks in advance.
Regards,
Pablo

The gradients will be reduced to the GPU you are specifying, so you might see a slightly increased memory usage on this device.

The DataLoaders look good. Since you are using GPUs, you should also set pin_memory=True to use the pinned host memory as the GPU cannot access data directly from pageable host memory.

1 Like