How to modify intermediate layer parameter gradients in DataParallel?

jerrybai1995 · February 28, 2020, 10:52pm

I was hoping to print and manually verify the gradient of intermediate layer parameters when using DataParallel. An example is below:

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.func = nn.Linear(3,3, bias=False)
        self.func2 = nn.Linear(3,3,bias=False)
        
    def forward(self, x):
        z = self.func(x)

        # I want the gradient of self.func2.weight here. I can 
        # get it when using a single GPU, but not in multi GPU setting...

        z = self.func2(z)
        return z 

net = Model()
para_net = nn.DataParallel(net)
xx = torch.randn(2,3).requires_grad_()
yy = para_net(xx)
loss = yy.mean()        # Just to produce a scalar
loss.backward()

Everything works fine when I’m using a single GPU (e.g., I can intercept with a torch.autograd.Function and manually modify the content of self.func2.weight.grad). However, once I use multiple GPUs by setting CUDA_VISIBLE_DEVICES=0,1, I can no longer access or modify it (e.g., if I intercept it, the printed self.func2.weight.grad will be None).

It’d be great if someone can help me resolve this issue, or point me to a solution!

kikic · April 9, 2020, 3:40am

Got the same issue here. Took me two days to realize it is with the DataParallel but not autograd.grad.

kikic · April 9, 2020, 3:42am

I wonder what kind of error did you get??

jerrybai1995 · February 17, 2021, 2:40pm

Well, the error was simply that my “modified gradient” didn’t go through in the backward pass of DataParallel; but I eventually found that having a backward hook solves this issue.