How does applying a mask to the output affect the gradients?

I want to apply a mask to my model’s output and then use the masked output to calculate a loss and update my model.
I don’t want the autograd to consider the masking operation when calculating the gradients, i.e. I want the autograd to treat my model as if it had outputed the masked version of my input. In other words, I don’t want to calculate any loss for the covered regions of output.
my code is something like:

batch = get_batch()
x = Variable(batch[‘data’], requires_grad=False)
mask = Variable(batch[‘mask’], requires_grad=False)
y = model(x)
z = y * mask
loss = loss_function(z)
loss.backward()
optimizer.step()

I assume since requires_grad is set to False for both x and mask this operation should not affect the gradients.
Currently my model behaves as if there is no masking so I want to know if I’m doing something wrong when applying the mask or if I should look for the problem somewhere else.

I’m also not sure if I should use with torch.no_grad() when applying the mask but I’m guessing I shouldn’t.

Hi Bahman!

Your code should do what you want. Autograd tracks computations
with pytorch tensors and doesn’t care whether those computations are
“free standing” (such as your mask operation) or part of your model or
part of your loss – they’re all treated the same.

If mask has 0s in it, then when you multiply an element of y by 0,
no gradients will flow back through that element of y (or, more
precisely, the gradients that flow back through it will be zero).

As an aside, Variable has been deprecated for some time you, so
you probably don’t want to be using it – just use regular tensors.

Just to check, if your mask is entirely zero, do your gradients all
become zero? Does changing z = y * mask in your code to z = y
truly have no effect on your gradients? “No effect” would indicate a
bug somewhere other than in the code you posted.

Best.

K. Frank

2 Likes

I checked and the gradients do become zero when I mask consists entirely of zeros.
Thank you.

Hi,

I find another situation does affect gradient:

for i in range(n_class):
    pred = output[:, label==i]
    label_s = label[label==i]
    if min(pred.shape)!=0 and min(label_s.shape)!=0:
        class_loss = self.loss(pred, label_s)
        loss_list[i] = class_loss

After calculate class-specific loss and use one of the losses to do backward, e.g., class 0, or just backward the averaged loss, the model’s para.grad are always None . Is there any idea to fix this problem?
Thank you!