GRU model:one of the variables needed for gradient computation has been modified by an inplace operation

The thing is that a single operation is never disallowed, it is a combinations of operations that can be.

c = a + b
a.div_(2)
(c + a).sum().backward()

This will work, because the value of a is not needed to compute the gradients in the sum operation.

c = a * b
a.div_(2)
(c + a).sum().backward()

This won’t work because the value of a is needed to compute the gradient wrt b in the multiplication.

So a given inplace operation can be allowed or not, depending on the surrounding code.

7 Likes