Gradient of intermediate representations

srikarym · February 29, 2020, 1:21am

Hi,
I’m using the following network architecture:

class Net(nn.module):
    def forward(x):
        x = self.base(x)
        fmap = x.view(x.size(0), -1)                               
        masked_fmap = fmap.masked_fill(self.indices_active,0)    #applies mask
        masked_fmap.requires_grad = True                          #need the grad of this activation
        return self.fc(masked_fmap), masked_fmap

Using masked_fill, I mask out 50% of the activations before final fc. I calculate the gradients of the masked representations this way:

model = Net()
yhat,fmap = model(x)
loss = criterion(yhat,y)
loss.backward()
gradients = fmap.grad

Is this the right way to calculate the gradients of intermediate activations? I observed that even though 50% of fmap is zero, the gradients are all non zero. Shouldn’t the gradients also be zero for the indices that are masked?