Hi, I am training my classification network on 4 RTX 2080 Ti. I am training resnet-152 on my machine and a single GPU can take a batch size of 32 max. I used:
model = nn.DataParallel(model, device_ids=[0, 1, 2, 3])
to run the model on multiple GPUs. From nvidia-smi, it seems that all the GPUs are used and I can even pass batch size of 128 [32 * 4] which makes sense. I have code that calculates training accuracy and validation accuracy after it’s trained for each epoch.
But my accuracy after each epoch increases quite fast in single GPU than on multi-GPU. I think the rate of change of accuracy should be similar in both the case. The time it takes to process one epoch decreases by about 4 [tentative] and it makes sense as I am using batch size of 128 [which is divided into 4 GPUs with mini batch size of 32]. But the rate of increase in accuracy after each epoch decreases while using 4 GPUs.
Here is the relevant code for training:
model = models.resnet152(pretrained=True)
input_features = model.fc.in_features
model.fc = nn.Linear(input_features, 100)
#model = nn.DataParallel(model, device_ids=[0, 1, 2, 3])
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.0001, momentum=0.9)
inside training method I have:
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
preds = torch.argmax(outputs, dim=1)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
For training on multi-gpu, all I did was:
uncomment the
model = nn.DataParallel(model, device_ids=[0, 1, 2, 3])
line and put batch_size=128 in the train_loader. Do I have to do something more?