Hello,
I used Faster R-CNN of PyTorch to train it on a dataset. It works well with one GPU. However, I have access to a system with 4 GPUs. I want to use 4 GPUs. However, when I check GPUs usage, only one GPU is used.
I select device in this manner:
if torch.cuda.is_available() == False and device_name == 'gpu':
raise ValueError('GPU is not available!')
elif device_name == 'cpu':
device = torch.device('cpu')
elif device_name == 'gpu':
if batch_size % torch.cuda.device_count() != 0:
raise ValueError('Batch Size is no dividable by number of gpus')
device = torch.device('cuda')
After that I do this:
# multi GPUs
if torch.cuda.device_count() > 1 and device_name == 'gpu':
print('=' * 50)
print('=' * 50)
print('=' * 50)
print("Let's use", torch.cuda.device_count(), "GPUs!")
# dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
# model = nn.DataParallel(model, device_ids=[i for i in range(torch.cuda.device_count())])
model = nn.DataParallel(model)
print('=' * 50)
print('=' * 50)
print('=' * 50)
# transfer model to selected device
model.to(device)
I move data to the device in this way:
# iterate over all batches
counter_batches = 0
for images, targets in metric_logger.log_every(data_loader, print_freq, header):
# transfer tensors to device(gpu, if not available cpu)
images = list(image.to(device) for image in images)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
# in train mode, faster r-cnn gives losses
loss_dict = model(images, targets)
# sum of losses
losses = sum(loss for loss in loss_dict.values())
I do not know what I did wrong.
Also, I get this warning:
/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all ’