Hi. I have a model who’s forward method performs a shallow copy of the tensors into a dictionary before returning like so -
def forward(self, input):
block0 = self.block0(input)
block1 = self.block1(block0)
self.end_points = {}
self.end_points['block0'] = (block0, 0)
self.end_points['block1'] = (block1, 1)
return block0
Where self.block0 and self.block1 are nn.Conv2d layers followed by batch norm and leaky relu
If now I do -
output = model(input)
loss = output.mean()
loss.backward()
print (model.block0.conv.bias.grad) #block0 is an nn.Module with contains a class attribute conv which is nn.Conv2d
The grad value is None. There is a similar outcome if I return just return self.end_points dict.
One the other hand with the following forward function -
def forward(self, input):
block0 = self.block0(input)
block1 = self.block1(block0)
self.end_points = {}
self.end_points['block0'] = (block0, 0)
return block0
self.end_points['block1'] = (block1, 1)
The grad attribute of model.block0 gets accumulated with the correct gradient.
I have this problem when I wrap the module in nn.DataParallel only. I’m using the following workaround since I have some custom functions.
class MyDataParallel(torch.nn.DataParallel):
"""
Allow nn.DataParallel to call model's attributes.
"""
def __getattr__(self, name):
try:
return super().__getattr__(name)
except AttributeError:
return getattr(self.module, name)
I am not able to understand why this is the case. Please help! Thank you in advance.