Parallelizing a training code using a couple of P100/V100 GPUs

Is there a way to parallelize the code in this repo which is written in PyTorch? The training approximately takes 1 month on a 1080Ti GPU, so I am wondering if I can run it much faster say on 8 P100 or V100 GPUs?
I am not familiar which part of code I should change to do so?

I am running it with this command:
$ python --cnn_weight [YOUR HOME DIRECTORY]/.torch/resnet152-b121ed2d.pth

I am using PyTorch 1.6.0

Their code is here:

You could use a data parallel approach to use multiple GPUs for the training.
This tutorial gives you an overview of different approaches.

1 Like