Parallel loss using multi gpu

Hi, I’m trying to distribute the loss among 4 available GPUs. I’m getting below error.

Traceback (most recent call last):
File “”, line 177, in
logps = torch.mean(torch.stack(logps))
RuntimeError: All input tensors must be on the same device. Received cuda:0 and cuda:3

Using file from thomaswolf. Any suggestion/methods to achieve parallel loss?

Code Snippet

criterion = nn.NLLLoss(reduce=False)
criterion = DataParallelCriterion(criterion, device_ids=[0, 1, 2, 3])

model =DataParallelModel(model, device_ids=[0, 1, 2, 3])

with torch.no_grad():
for inputs, labels in testloader:
inputs, labels =,
logps = model.forward(inputs)
batch_loss = criterion(logps, labels)
test_loss += batch_loss.item()
logps = torch.mean(torch.stack(logps))

                # Calculate accuracy
                ps = torch.exp(logps)
                top_p, top_class = ps.topk(1, dim=1)
                equals = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor)).item()

Could you try to call the model directly instead of model.forward?
I’m not sure, how DataParallelModel is implemented, but if you call the model directly, its internal __call__ method will be used, which will properly register all hooks etc.
Let me know, if that helps.

Thank you for the reply. I’ll try that and let you know the results.

I have used for custom parallelization of model and loss.