I was hoping to print and manually verify the gradient of intermediate layer parameters when using DataParallel. An example is below:
class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.func = nn.Linear(3,3, bias=False) self.func2 = nn.Linear(3,3,bias=False) def forward(self, x): z = self.func(x) # I want the gradient of self.func2.weight here. I can # get it when using a single GPU, but not in multi GPU setting... z = self.func2(z) return z net = Model() para_net = nn.DataParallel(net) xx = torch.randn(2,3).requires_grad_() yy = para_net(xx) loss = yy.mean() # Just to produce a scalar loss.backward()
Everything works fine when I’m using a single GPU (e.g., I can intercept with a
torch.autograd.Function and manually modify the content of
self.func2.weight.grad). However, once I use multiple GPUs by setting
CUDA_VISIBLE_DEVICES=0,1, I can no longer access or modify it (e.g., if I intercept it, the printed
self.func2.weight.grad will be
It’d be great if someone can help me resolve this issue, or point me to a solution!