Hi,
I’m using the following network architecture:
class Net(nn.module):
def forward(x):
x = self.base(x)
fmap = x.view(x.size(0), -1)
masked_fmap = fmap.masked_fill(self.indices_active,0) #applies mask
masked_fmap.requires_grad = True #need the grad of this activation
return self.fc(masked_fmap), masked_fmap
Using masked_fill, I mask out 50% of the activations before final fc. I calculate the gradients of the masked representations this way:
model = Net()
yhat,fmap = model(x)
loss = criterion(yhat,y)
loss.backward()
gradients = fmap.grad
Is this the right way to calculate the gradients of intermediate activations? I observed that even though 50% of fmap is zero, the gradients are all non zero. Shouldn’t the gradients also be zero for the indices that are masked?