In-place operation error in backward call

Hello,

I am trying to run the below code but I am getting an in place operation error for loss computation. Is there a way to resolve this without cloning the “target” tensor? Maybe a PyTorch function to directly subtract the specific elements from the x,y,z indices?

import torch

batch_size = 64
data = (torch.rand(batch_size, 50, 100) < 0.01).float().to_sparse()
target = torch.rand(batch_size, 50, 100) - 0.5
target.requires_grad = True

x, y, z = data.indices()
losses = torch.exp(target)
losses[x, y, z] = losses[x,y,z] -  data.values() * target[x, y, z]
loss = losses.sum().backward()

Error:
RuntimeError Traceback (most recent call last)
Cell In[18], line 4
2 losses = torch.exp(target)
3 losses[x, y, z] = losses[x,y,z] - data.values() * target[x, y, z]
----> 4 loss = losses.sum().backward()

File ~/miniconda3/envs/sbtt-demo/lib/python3.9/site-packages/torch/_tensor.py:488, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
478 if has_torch_function_unary(self):
479 return handle_torch_function(
480 Tensor.backward,
481 (self,),
(…)
486 inputs=inputs,
487 )
→ 488 torch.autograd.backward(
489 self, gradient, retain_graph, create_graph, inputs=inputs
490 )

File ~/miniconda3/envs/sbtt-demo/lib/python3.9/site-packages/torch/autograd/init.py:197, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
192 retain_graph = create_graph
194 # The reason we repeat same the comment below is that
195 # some Python versions print out the first line of a multi-line function
196 # calls in the traceback and some print out the last line

→ 197 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
198 tensors, grad_tensors
, retain_graph, create_graph, inputs,
199 allow_unreachable=True, accumulate_grad=True)

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 50, 100]], which is output 0 of ExpBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Hi Aa!

When you assign into losses using indices you modify losses inplace,
hence the error. There is no way to avoid this without somehow creating
a new tensor (not necessarily using .clone()).

The simplest fix might be losses = torch.exp (target).clone().

In a little more detail, the gradient (derivative) of exp (x) is exp (x) itself.
Therefore autograd keeps a reference to exp (x) during the forward pass
so that it doesn’t have to recompute it during the backward pass. If you
modify exp (x) inplace, autograd can’t complete the backward pass (It
isn’t willing to recompute exp (x).), so it complains.

If you want to modify losses, you actually have to modify a copy of it. In
your use case .clone() is probably the most straightforward way to make
that copy.

Best.

K. Frank

Hey Frank,
Thanks for the response and explanation. It’s much clearer now.
Is there any way I can do this without creating a copy of the target tensor? The only reason I was trying this out was to reduce the memory usage and creating a copy of the tensor would increase the memory usage.

Hi Aa!

First note that you are not “creating a copy of the target tensor.” You are
creating a copy of the losses tensor so that you can modify that copy
inplace.

Having said that, in the context of the specific code you posted, you could
trade computation for memory, as follows:

When you backpropagate through exp (target), you need access
to exp (target) because it is, itself, the gradient of exp (target).
torch.exp() does this in concert with autograd by saving a reference
to exp (target) for use in the backward pass (and then complains
when you modify it inplace).

At the cost of recomputing exp (target), your forward pass could save
a reference to target itself (and then recompute exp (target) during
the backward pass). In your specific case you are not freeing target, so
holding on to (a reference to) target costs no additional memory, and
you don’t modify target, so you don’t need a second copy of it.

torch.exp() won’t do this for you, but you can write your own custom
autograd function
that works this way.

However, I would recommend instead that you make your peace with the
fact that neural networks are typically memory intensive, especially when
using autograd to carry out the forward / backward pass process. If you are,
in fact, running out of memory, you could buy more memory, run a simpler,
smaller model, or use a smaller batch size, depending on what you can make
work for your use case.

Good luck.

K. Frank