I am trying to run this code:
to visualize saliency map from ResNet. This code is written for CPU. I modify a little bit to put all the models and tensors to cuda to run it on GPUs. The main result I care about is in line 65 (the self.gradients).
The code works fine on single GPU. However, when I run the code on multiple GPUs with input size to be: 64x3x32x32 (cifar10 image dataset), the results I get is: 16x3x32x32 (it should be: 64x3x32x32).
To me, the problem seems to be on line 35: the register_backward_hook function failed to collect all the gradients from all the GPUs but the last one.
Am I doing something wrong or is this a known bug for PyTorch? If so, is there any way around for this issue?
Thank you very much!