I have an object detection task. I am using torch.nn.DataParallel
over the model and using torch.device('cuda')
as the device at the time of training. I see that only one of the GPUs does most of the processing, and the other two are given only 1-1.5 GB of data. Here is the code for setting up the training:
model = torch.nn.DataParallel(model)
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=1e-3, momentum=0.9, weight_decay=0.005)
num_epochs = 20
model, best_model = train_model(model=model,
optimizer=optimizer,
data_loader=data_loader_train,
device=torch.device('cuda'),
num_epochs=num_epochs)
I am using a batch size of 32. If I increase the batch size, then then the GPU runs out of memory. If I use a batch size of 32, then the workload is very unevenly distributed. Is there any way to evenly distribute the workload so that I can increase the batch size?