Gradient computation sees a variable modified but it doesn't seem o

iftg · August 9, 2024, 5:00pm

I get the following error:
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 1, 6]], which is output 0 of TanhBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

My forward functions is:

def forward(self, x: torch.Tensor):
        # x: [Batch_size, seq_len, input_size]
        batch_size = x.size(0)
        
        # Flatten the input across the sequence and input size
        x = x.reshape(batch_size, -1) # x: [Batch_size, seq_len * input_size]
        
        # Apply the linear layer
        x = self.Linear(x) # x: [Batch_size, pred_len * predict_size]
        
        # Reshape the output to [Batch_size, pred_len, predict_size]
        x = x.reshape(batch_size, self.pred_len, self.predict_size)

        # Apply activation function if specified
        x = self.activation(x)
        
        return x # [Batch_size, pred_len, predict_size]

self.Linear is defined as

self.Linear = nn.Linear(self.seq_len * self.input_size, self.pred_len * self.predict_size)

Using view() instead of reshape() doesn’t help.
self.activation can be any: relu, tanh,… If I remove x = self.activation(x) the code works fine.
Could you please advise what’s wrong here?

KFrank · August 9, 2024, 7:19pm

Hi iftg!

iftg:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 1, 6]], which is output 0 of TanhBackward0, is at version 1; expected version 0 instead.
…

My forward functions is:
def forward(self, x: torch.Tensor):
        ...
        return x # [Batch_size, pred_len, predict_size]

Most likely, the output of forward() is being modified inplace after
forward() has run.

self.activation(x) isn’t causing the inplace modification error, per se.
Rather, the presence of self.activation (x) in the computation graph
is causing an inplace modification – that exists with or without the call
to self.activation (x) – to matter.

There is a reasonable chance that adding a .clone(), specifically
x = self.activation (x).clone(), will fix your problem.

Or ask pytorch to sweep this inplace modification error under the rug
for you.

Sometimes these errors are a symptom of something incorrect or
sub-optimal in what you are doing. If you want to track down the root
cause, take a look at the debugging techniques in the following post:

"RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 1]], which is output 0 of AsStridedBackward0, is at version 3; expected version 2 instead. Hint: the backtrace further a autograd

Hi Fahmyadan and Sangyoon! Here are some suggestions about how to track down (and maybe fix) inplace-modification errors. Note that an inplace modification in the forward pass is not necessarily* an error – it depends on whether and how the tensor that was modified is used in the backward pass. Note that inplace operations can be useful for saving memory – if you replace an innocent inplace operation with an out-of-place equivalent, your training will use more memory (and, to a minor e…

Best.

K. Frank

iftg · August 12, 2024, 3:03pm

Your suggestion solves the problem. Perhaps, a better way, which also does the trick it to swap two lines:

        # Apply activation function if specified
        x = self.activation(x)

        # Reshape the output to [Batch_size, pred_len, predict_size]
        x = x.reshape(batch_size, self.pred_len, self.predict_size)