Get error message "MaskedFill can't differentiate the mask"

My a program was able to run in earlier version 0.2.0+0eec332 of pytorch (date back to Oct. 6, 2017).
Yesterday and today I upgraded my pytorch to the latest version from source, and now I can’t run
the program. The error message I got is as follows

File “/usr/local/lib/python3.5/dist-packages/torch/autograd/”, line 85, in setitem
return MaskedFill.apply(self, key, value, True)
File “/usr/local/lib/python3.5/dist-packages/torch/autograd/_functions/”, line 483, in forward
assert not ctx.needs_input_grad[1], "MaskedFill can’t differentiate the mask"
AssertionError: MaskedFill can’t differentiate the mask

Don’t know what happens. Can anyone help on this? Thanks in advance.

Can you try variable[key.detach()] = value instead?

Interestingly, I don’t see any relevant code changes that happened within the last month from a quick look.

I don’t know how to use variable[key.detach()], will look at it later.
If necessary, I can paste my code here.

I google searched the following phrase

assert not ctx.needs_input_grad[1], “MaskedFill can’t differentiate the mask”

which return to page

Don’t know if this code changes.

Usually I installed pytorch from source on my machine for both python 2.7 and python 3.5.
My last working version of pytorch for python 3.5 actually was installed about earlier of this Sept.,
which was uninstalled and upgraded to the latest one. My current working version of pytorch for
python 2.7 was installed in this Oct. 6, and I have to keep it NOW.

By variable[key.detach()] = value I meant detach your index variable before trying to index into another variable using it. I suppose you didn’t explicitly call variable.__setitem__(key, value). If you post the code, I can take a look.

My code is as follows:

class MyReLU(torch.autograd.Function):

    def forward(ctx, input):
        output = input.clamp(min=0)
        return output

    def backward(ctx, grad_output):
        input, = ctx.saved_variables
        grad_output[input < 0] = 0
        return grad_output

myRelu = MyReLU.apply

which runs well in the earlier version of pytorch, and fails in the latest version of pytorch
(up to yesterday), and gave out error message in the above first post. However, after just
happen to read a related post

I modified the code to

class MyReLU(torch.autograd.Function):

    def forward(self, input):
        output = input.clamp(min=0)
        return output

    def backward(self, grad_output):
        input, = self.saved_tensors
        grad_output[input < 0] = 0
        return grad_output

and it works now in the both old and newer version of pytorch! Interesting, but I don’t know why!
If you or someone can explain this, that would be appreciated!

This is effectively equivalent with what I suggested. In your 2nd code snippet, input is a tensor, which do not require gradient. Hence no error.

In fact, all your code can just be simplified to either use default ReLU or just directly call .clamp. What’s the purpose of writing this function?


Thanks for the explanation.

The above code snippet is originally come from

with minor difference that @staticmethod is added in my code. Without this @staticmethod
my program also raises error. The only difference is that I use myRelu = MyReLU.apply and
this MyReLU class in another model class, while in the above source link, it uses
myRelu = MyReLU() method and directly runs it in main program.

Yes, this piece of code is simply replicated the default / standard ReLU function.
The reason I want to exercise it is that it reveals inner characters of backpropagation
algorithm, and I may implement some more complicated activation function!
I want both forward and backward passes, and can’t only call .clamp.
In addition, code piece like “grad_output[input < 0] = 0” is elegant, which seems hard
to find in other frameworks or languages. I highly appreciate the work of PyTorch!

OK, I just tried call .clamp directly as ReLU function without backward pass implement,
and it works without problem! This is awesome, and means that .clamp likes regular +, /, etc,
are all qualified Torch math operations that we don’t have to implement their backward pass
by our own!

@Chun_Li, this is a bug in PyTorch master. I wrote up an issue here:


So, not just me feel the problem within the recent versions of pytorch.
Waiting for the bug fix. Thanks a lot for the works of pytorch team!