How to collect all the gradients from multiple GPUs


#1

Hi!

I am trying to run this code:


to visualize saliency map from ResNet. This code is written for CPU. I modify a little bit to put all the models and tensors to cuda to run it on GPUs. The main result I care about is in line 65 (the self.gradients).

The code works fine on single GPU. However, when I run the code on multiple GPUs with input size to be: 64x3x32x32 (cifar10 image dataset), the results I get is: 16x3x32x32 (it should be: 64x3x32x32).

To me, the problem seems to be on line 35: the register_backward_hook function failed to collect all the gradients from all the GPUs but the last one.

Am I doing something wrong or is this a known bug for PyTorch? If so, is there any way around for this issue?

Thank you very much!


#2

Here in your code you’re setting

def hook_function(module, grad_in, grad_out):
    self.gradients = grad_in[0]

I think this happens on each GPU, so in the end you only get one-fourth of what you should have gotten (assuming 4 gpus).

You can try defining self.gradients as a python list, and then appending to it:

def hook_function(module, grad_in, grad_out):
    self.gradients.append(grad_in[0])

#3

Hi @richard, Thanks a lot for your help!

Yes, I tested your method and it is working perfectly! Thanks a lot!

I have another question. When I use the code on images with batch size 128, the memory of GPU is blown up. I then got an out of memory error. Do you have any suggestions on that?

Thanks again!


#4

Other than shrinking the batch size, not really, sorry. Maybe someone else can weigh in here about how to better work with OOMs.