Cant save in in backward hook on multi GPU - 3rd post, please help

pytorcher · February 28, 2019, 1:43am

I’m trying to save a value during a backwards hook function with self.X=value1. This works perfectly with 1 GPU but on multiple GPU’s is does not. I can even make it a constant with self.testval = 5. With CUDA_VISIBLE_DEVICES=0 set before the call to backward() it says it has no attribute named testval, but right after it shows it is 5. with CUDA_VISIBLE_DEVICES=0,1 it says no attribute both times. I am completely stuck here.

samster25 · February 28, 2019, 1:47am

Do you apply the hook before or after wrapping the model in DataParallel?

pytorcher · February 28, 2019, 3:31am

I call register_backward_hook when I initialize the nn.Module, then I call DataParallel on the module, then the training with the calls to backward()

samster25 · February 28, 2019, 3:33am

when you call register_backward_hook it traverses the modules that the module has. When you initialize a DataParallel module, it creates a copy of each module for each GPU. I would try doing the register on the DataParallel module

pytorcher · February 28, 2019, 4:17am

Hmmm, that definitely seems like it could be the problem! Right now I am just calling DataParallel on the whole network, while I need to do the backward hook on individual layers. I don’t seem to be able to do net.module.layerID.register_backward_hook. I imagine the net.module part of that is the problem. Do I need to call DataParallel on each layer individually or is there a better way to tell a dataparallel I just want to call the backward hook on one of its submodules?

90% sure the above solved the problem calling dataparallel on each layer individually

MariosOreo · March 7, 2019, 3:40am

Hi,

In my shallow view, the layers is nn.Module same to the whole network, I think if you solve the problem calling dataparallel on each layer individually, you could also calling dataparallel on the whole network and register_backward_hook to specific layers.

pytorcher · March 9, 2019, 10:55pm

Do you mean even if I do end up going in and calling Dataparallel on each layer to solve the problem I should still additionally call DataParallel on the network?