Given a feature map x with size of ncw*h, For each sample n, I want to set some specific channels to zero.
I multiply X by a Mask, but the accuracy is not satisfactory.
Code:
def forward(self, x):
mask = Variable(torch.ones(x.size()))
mask = mask.cuda()
x = x * mask
x = F.avg_pool2d(x, x.size()[2:])
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
x = x * mask does not change the value of x, why I get a lower accuracy than I did not do this multiplication operation?
Is there anything wrong in the backward of training?