Hello,

I read Thomas Woolf’s article on balanced loads when using multiple GPUs, and I would like to adapt it for my training. He mentioned that when using `DataParallelModel`

and compared to ` torch.nn.DataParallel`

, the predictions in the forward pass (`predictions = parallel_model(inputs)`

) would be a tuple of *n* tensors, with each tensor being located on a specific GPU (There are *n* GPUs used for training).

Here is the code for his implementation. To recap, I simply wrap the model like this:

```
parallel_model = DataParallelModel(model)
predictions =parallel_model(inputs)
```

This will affect how I currently compute the accuracy because I use torch.max to get the prediction from the tensors like this:

```
_, pred = torch.max(predictions.data, dim=1)
correct += (pred == label).sum().item()
total += label.size(0)
std_acc= (correct/total) * 100
```

I am thinking this can be solved by iterating through the tuple `predictions`

like this:

```
for i in range(len(gpu_list)):
_, pred = torch.max(predictions[i].data, dim=1)
correct += (pred == label).sum().item()
total += label.size(0)
std_acc= (correct/total) * 100
```

However, each tensor is located on a different GPU, so iterating through it doesn’t seem to make sense. How can I access the `pred`

for each tensor in `predictions`

given that they are in different GPUs, and how can I compare them to their original labels?

Or is there a better way to calculate the accuracy? Thank you.