RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 2; expected version 1 instead

banikr · October 26, 2022, 5:26pm

Error:

Traceback (most recent call last):
  File "/home/banikr/.config/JetBrains/PyCharm2022.1/scratches/scratch_8.py", line 124, in <module>
    loss_dec.backward(retain_graph=True)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The error comes from:

for epoch in range(max_epochs):
    for batch_idx, data in tqdm(enumerate(train_loader), total=len(train_loader),
                    desc='Train epoch=%d' % epoch, ncols=100, leave=False):  # epoch will be updtd by train_epoch()
        x = Variable(data, requires_grad=False).float().to(device)
        x_avg = x_avg + torch.mean(x, axis=0)
        # +------------------------------+
        # |        generator loss        |
        # +------------------------------+
        x_hat, elbo_loss = net_g(x)
        x_hat_avg = x_hat_avg + torch.mean(x_hat, axis=0)
        # z = net_g.decoder()
        _, z_p, _, _ = net_g.encoder(x)
        x_p = net_g.decoder(z_p)
        # +----------------------------------+
        # |        discriminator loss        |
        # +----------------------------------+
        d = net_D(x)
        d_hat = net_D(x_hat)
        d_p = net_D(x_p)
        real_label = Variable(Tensor(x.size(0), 1).fill_(1.0), requires_grad=False).to(device)
        fake_label = Variable(Tensor(x.size(0), 1).fill_(0.0), requires_grad=False).to(device)
        loss_D_real = adversarial_loss(d, real_label)
        loss_D_fake = adversarial_loss(d_hat, fake_label)
        loss_D_prior = adversarial_loss(d_p, fake_label)
        loss_gan = loss_D_real + loss_D_fake + loss_D_prior
        # print(loss_gan)
        optimizer_D.zero_grad()
        loss_gan.backward(retain_graph=True)
        optimizer_D.step()
        # +----------------------------+
        # |        decoder loss        |
        # +----------------------------+
        rec_loss = ((net_D(x_hat) - net_D(x)) ** 2).mean()
        print(rec_loss)
        loss_dec = gamma * rec_loss - loss_gan # <<<< error here
        optimizer_d.zero_grad()
        loss_dec.backward(retain_graph=True)
        optimizer_d.step()

Some of the solutions to similar errors are changing the dropout layer inplace(which I am not using here).
adversarial_loss is:

adversarial_loss = torch.nn.BCELoss().to(device)

Any help in troubleshooting is much appreciated.

ptrblck · October 27, 2022, 5:50am

Might be similar to:
Getting this Error Message: "RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation" - #2 by ptrblck

banikr · October 27, 2022, 4:05pm

Hi @ptrblck
removing retain_graph generates the following error from the very same line:

loss_dec.backward()#retain_graph=True)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Do you think using the loss_gan again after backpropagating it could be the reason?

AlphaBetaGamma96 · October 27, 2022, 7:26pm

Hi @banikr,

Have you tried running your code within a torch.autograd.set_detect_anomaly(True) context manager? It might give a more detailed stacktrace.

Also, check your models aren’t using an in-place ReLU by ensuring ReLU(inpalce=False) etc.

banikr · October 27, 2022, 7:54pm

Hi,
I used the anomaly part as follow:

torch.autograd.set_detect_anomaly(True)
for epoch in range(max_epochs):
.
.

And added inplace=False in ReLU.

The following error generates:

loss_D_prior = adversarial_loss(d_p, fake_label)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 613, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/nn/functional.py", line 3083, in binary_cross_entropy
    return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
 (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484809535/work/torch/csrc/autograd/python_anomaly_mode.cpp:102.)
  allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
  File "/home/banikr/.config/JetBrains/PyCharm2022.1/scratches/scratch_8.py", line 124, in <module>
    loss_dec.backward()#retain_graph=True)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

AlphaBetaGamma96 · October 27, 2022, 8:22pm

Try putting the retain_graph=True for the decoder, and re-run it (with the torch.autograd.set_detect_anamoly() context manager). It should find where the in-place error is occurring.

soulitzer · October 27, 2022, 8:58pm

The in-place may be happening from the optimizer step, you should move it until after your perform the second backward (which doesn’t need retain_graph=True). Alternatively if you are able to modify the model itself, you can clone the parameters before using them in any operation. It is likely that you are directly using your parameters in an operation that saves those inputs for backward. If you perform in-place optimizer update and then backward again through that same graph, it would’ve produce incorrect gradients (which the error protects against).

(For more context, retain_graph=True is useful when you plan to backward through the same graph another time. This is true for the first time you execute backward, but not the second)

banikr · October 28, 2022, 3:21pm

Hello @AlphaBetaGamma96,
This is the error I am getting following your advice:

torch.autograd.set_detect_anomaly(True)
for epoch in range(max_epochs):
    epoch_loss = 0
    ...

loss_D_prior = adversarial_loss(d_p, fake_label)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 613, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/nn/functional.py", line 3083, in binary_cross_entropy
    return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
 (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484809535/work/torch/csrc/autograd/python_anomaly_mode.cpp:102.)
  allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
  File "/home/banikr/.config/JetBrains/PyCharm2022.1/scratches/scratch_8.py", line 125, in <module>
    loss_dec.backward(retain_graph=True)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

banikr · October 28, 2022, 4:37pm

@soulitzer,
I thought optimizer_D step and optimizer_d are optimizing different networks. Though they share the loss_gan.
I took the first optimizer step after optimizer D but still got the errors.

Could you give me more hints with some demo code on how to clone parameters?

soulitzer · October 28, 2022, 7:54pm

I thought optimizer_D step and optimizer_d are optimizing different networks. Though they share the loss_gan .

Yes because the loss-gan part is shared, parameters (which are saved for backward) modified in-place by the first optimizer step will be used in the second backward. I’m not sure your code includes the part where the optimizers are defined (I doubt that that is relevant though)

Could you give me more hints with some demo code on how to clone parameters?

There’s a way to apply this clone automatically to any model: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation (Meta Learning) - #12 by soulitzer

I took the first optimizer step after optimizer D but still got the errors.

Even though the cloning should fix this issue, I’d suggest looking into this error more to see if we can avoid the extra overhead of cloning. Was this the same error? Could you post the stack trace. (Or if possible, post a short runnable snippet to demonstrate the issue)

banikr · October 28, 2022, 11:46pm

@soulitzer
I have shared demo code to this colab file: Google Colab

It also generates the same error with demo data:

RuntimeError                              Traceback (most recent call last)
<ipython-input-8-cdfe813ffd22> in <module>
     98         loss_dec = gamma * rec_loss - loss_gan
     99         optimizer_d.zero_grad()
--> 100         loss_dec.backward(retain_graph=True)
    101         optimizer_d.step()
    102         # +----------------------------+

1 frames
/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    173     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    174         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 175         allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
    176 
    177 def grad(

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Will appreciate if you take a look.

soulitzer · October 31, 2022, 5:56pm

Thanks for the repro, I’m taking a look.

soulitzer · November 2, 2022, 5:22pm

Sorry for the delay. I’ve tried it now, and it looks like it runs without error on the latest nightly. Let me know if upgrading works for you.

soulitzer · November 2, 2022, 5:28pm

Ahhh I also just realized that I wrote the comment above, I didn’t see that you had three losses to backward in total, you’ll need to make sure all the optimizer updates are done at the very end.

So, modifying the code in the a way is probably the preferred solution here:

# +----------------------------------+
# |        discriminator loss        |
# +----------------------------------+
d = net_D(x)
d_hat = net_D(x_hat)
d_p = net_D(x_p)
real_label = Variable(Tensor(x.size(0), 1).fill_(1.0), requires_grad=False).to(device)
fake_label = Variable(Tensor(x.size(0), 1).fill_(0.0), requires_grad=False).to(device)
loss_D_real = adversarial_loss(d, real_label)
loss_D_fake = adversarial_loss(d_hat, fake_label)
loss_D_prior = adversarial_loss(d_p, fake_label)
loss_gan = loss_D_real + loss_D_fake + loss_D_prior
optimizer_D.zero_grad()
loss_gan.backward(retain_graph=True)
  
# +----------------------------+
# |        decoder loss        |
# +----------------------------+
rec_loss = ((net_D(x_hat) - net_D(x)) ** 2).mean()
print(rec_loss)
loss_dec = gamma * rec_loss - loss_gan
optimizer_d.zero_grad()
loss_dec.backward(retain_graph=True)
  
# +----------------------------+
# |        encoder loss        |
# +----------------------------+
loss_enc = elbo_loss + rec_loss
optimizer_e.zero_grad()
loss_enc.backward()#retain_graph=True)
optimizer_D.step()
optimizer_d.step()
optimizer_e.step()