Parallelize Pytorch for loop evaluation?

Goldname · February 12, 2023, 6:56pm

I was wondering if it’s possible to parallelize this model evaluation within a for loop using a gpu here?
Should I be doing this in PyTorch or use multiprocessing module in Python? Here’s an example of one of the things I want to do:

model.eval()
with torch.no_grad(): 
  for param in model.parameters():
    for j in param.flatten():
      for i in range(0,3):
        j = torch.tensor(i)
        correct = 0
        for batch, label in tqdm(evalloader):
            batch = batch.to(device)
            label = label.to(device)
            pred = model(batch)
            print(pred)
            correct += (torch.argmax(pred,dim=1)==label).sum().item()
            print(correct)
            break
        accuracy = correct/len(evalloader.dataset)

In the above code I am basically trying to map out what the accuracy landscape looks like by doing a grid search. I manually changing the weights to 0,1,2 and evaluating the model for every single weight/bias of the neural network.

How can I use the GPU to evaluate the model in parallel a bunch of times?

PS: In fact, why doesn’t parallel evaluation of a model happen even for simple cases? Sometimes a testing dataset can be quite large.

eqy · February 12, 2023, 9:47pm

The approach depends on where the bottleneck is. If your GPU(s) are already fully utilized, parallelizing the evaluation loop wouldn’t yield a speedup. On the other hand, if it isn’t fully utilized, I would take a look at increasing the batch size first as that can be a simpler way to add parallelism/increase utilization without jumping through multiprocessing hoops.

Goldname · February 13, 2023, 4:08pm

Do you mean that the batch size is large enough such that the GPU is already fully utilized?

eqy · February 14, 2023, 3:30am

Yes, I would consider increasing the batch size and see if that is another way of increasing performance. I wouldn’t expect any benefit of parallelizing the for loop over the current implementation as it would just increase contention on the GPU(s).