RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation when using ReLU(inplace=False)

I’m running models.py from this repository. Running the code as is gives me the error RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [24, 32, 16, 65]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

What’s confusing is that the ReLU instance is using the default inplace=False. If I change to using the LeakyReLU activation with negative_slope = 0. (giving me plain ol’ ReLU) the code runs without an error.

Any ideas what may be causing this? I’m happy to have found a solution, but I’d quite to know what might be causing it to break in the first place.

The issue is most likely not related to the relu operation itself, but to another op which is manipulating the relu output:

import torch
import torch.nn as nn

relu = nn.ReLU()

x = torch.randn(10, 10, requires_grad=True)

# works
out = relu(x)
out = out * 2
out.mean().backward()
x.grad = None

# fails
out = relu(x)
out *= 2 # inplace 
out.mean().backward()
# RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10, 10]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead.

since relu needs its output to compute the dgrad.

In this instance, changing from using nn.ReLU(x) to nn,LeakyreLU(x) should still result in the same error being thrown?

No, since leaky_relu needs its input for the gradient calculation as seen here while relu needs its output as seen here.

That’s interesting. Might be outside the scope of this thread, but what is the reason for that difference?

With respect to the output though, would you recommend hunting down and fixing the stray inplace operation, or will using leaky_relu with negative slope = 0. achieve the same outcome?