Question about the inplace operation

In my forward function, I want to set all masked value be -inf after a sigmoid operation, here is my code:

    def forward(self, x, mask):
        x =
        x = self.linear2(x).squeeze(-1)
        score = torch.sigmoid(x)
        score[mask] = -math.inf
        return score

when execute the backward propogation, I got

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

but if I change the order of inplace operation and sigmoid function, this error will not occur:

    def forward(self, x, prev_w, mask):
        x =
        x = self.linear2(x).squeeze(-1)
        x[mask] = -math.inf
        score = torch.sigmoid(x)
        return score

under these circumstances, I also execute an inplace operation(x[mask] = -math.inf), but it works well in backpropagation (though the results are not what I want since the masked value are set to 0 after sigmoid operation, not -inf), so here is my question:

  1. what is the difference between these two types of codes when executing backpropagation in Pytorch? I know inplace operation is not allowed in backpropagation, but in my understanding,

x[mask] = -math.inf

is also an inplace operation, right?

  1. how can I correctly get what I want, in other words, set all masked value to be -inf after sigmoid operation and can also properly execute backpropagation?

Thanks for your help!

I found this issue on Github and maybe it’s related to my problem.
backward pass different behaviors with inplace operation

To my understand, when calculate the derivative of score, Pytorch’s autograd mechanism uses the formula S'(x) = S(x)(1-S(x)), which means we cannot change the S(x) because we need it. In my second case, the inplace operation is before the sigmoid operation, so the second case doesn’t raise error.

But here comes another question (:cry:), why the inplace operation on x doesn’t raise error? or more clearly, why the following codes doesn’t raise error?

a = torch.randn((5), requires_grad=True)
b = 2*a + 1
c = b*b + 1
b[1] = 0.
d = torch.mean(b)

in this case, b is needed to correctly compute the gradient of c, but b is modified after c, why this case doesn’t raise error?

why the inplace operation on x doesn’t raise error?

Because in your second example, you change x before the sigmoid. So that is fine.

If you were modifying x after doing the sigmoid, it wouldn;t be a problem either as the sigmoid only requires its output to compute the gradient, not its input.
If you want to see the details, you can see here that result is used to compute the gradients but not self. So you can modify self as much as you want.

Your example code

I think you have a typo as you never use c :slight_smile: