Traceback (most recent call last):
File "/home/banikr/.config/JetBrains/PyCharm2022.1/scratches/scratch_8.py", line 124, in <module>
loss_dec.backward(retain_graph=True)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
The error comes from:
for epoch in range(max_epochs):
for batch_idx, data in tqdm(enumerate(train_loader), total=len(train_loader),
desc='Train epoch=%d' % epoch, ncols=100, leave=False): # epoch will be updtd by train_epoch()
x = Variable(data, requires_grad=False).float().to(device)
x_avg = x_avg + torch.mean(x, axis=0)
# +------------------------------+
# | generator loss |
# +------------------------------+
x_hat, elbo_loss = net_g(x)
x_hat_avg = x_hat_avg + torch.mean(x_hat, axis=0)
# z = net_g.decoder()
_, z_p, _, _ = net_g.encoder(x)
x_p = net_g.decoder(z_p)
# +----------------------------------+
# | discriminator loss |
# +----------------------------------+
d = net_D(x)
d_hat = net_D(x_hat)
d_p = net_D(x_p)
real_label = Variable(Tensor(x.size(0), 1).fill_(1.0), requires_grad=False).to(device)
fake_label = Variable(Tensor(x.size(0), 1).fill_(0.0), requires_grad=False).to(device)
loss_D_real = adversarial_loss(d, real_label)
loss_D_fake = adversarial_loss(d_hat, fake_label)
loss_D_prior = adversarial_loss(d_p, fake_label)
loss_gan = loss_D_real + loss_D_fake + loss_D_prior
# print(loss_gan)
optimizer_D.zero_grad()
loss_gan.backward(retain_graph=True)
optimizer_D.step()
# +----------------------------+
# | decoder loss |
# +----------------------------+
rec_loss = ((net_D(x_hat) - net_D(x)) ** 2).mean()
print(rec_loss)
loss_dec = gamma * rec_loss - loss_gan # <<<< error here
optimizer_d.zero_grad()
loss_dec.backward(retain_graph=True)
optimizer_d.step()
Some of the solutions to similar errors are changing the dropout layer inplace(which I am not using here). adversarial_loss is:
Hi @ptrblck
removing retain_graph generates the following error from the very same line:
loss_dec.backward()#retain_graph=True)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
Do you think using the loss_gan again after backpropagating it could be the reason?
torch.autograd.set_detect_anomaly(True)
for epoch in range(max_epochs):
.
.
And added inplace=False in ReLU.
The following error generates:
loss_D_prior = adversarial_loss(d_p, fake_label)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 613, in forward
return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/nn/functional.py", line 3083, in binary_cross_entropy
return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
(Triggered internally at /opt/conda/conda-bld/pytorch_1659484809535/work/torch/csrc/autograd/python_anomaly_mode.cpp:102.)
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "/home/banikr/.config/JetBrains/PyCharm2022.1/scratches/scratch_8.py", line 124, in <module>
loss_dec.backward()#retain_graph=True)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
Try putting the retain_graph=True for the decoder, and re-run it (with the torch.autograd.set_detect_anamoly() context manager). It should find where the in-place error is occurring.
The in-place may be happening from the optimizer step, you should move it until after your perform the second backward (which doesn’t need retain_graph=True). Alternatively if you are able to modify the model itself, you can clone the parameters before using them in any operation. It is likely that you are directly using your parameters in an operation that saves those inputs for backward. If you perform in-place optimizer update and then backward again through that same graph, it would’ve produce incorrect gradients (which the error protects against).
(For more context, retain_graph=True is useful when you plan to backward through the same graph another time. This is true for the first time you execute backward, but not the second)
Hello @AlphaBetaGamma96,
This is the error I am getting following your advice:
torch.autograd.set_detect_anomaly(True)
for epoch in range(max_epochs):
epoch_loss = 0
...
loss_D_prior = adversarial_loss(d_p, fake_label)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 613, in forward
return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/nn/functional.py", line 3083, in binary_cross_entropy
return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
(Triggered internally at /opt/conda/conda-bld/pytorch_1659484809535/work/torch/csrc/autograd/python_anomaly_mode.cpp:102.)
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "/home/banikr/.config/JetBrains/PyCharm2022.1/scratches/scratch_8.py", line 125, in <module>
loss_dec.backward(retain_graph=True)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/banikr/miniconda3/envs/ims37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
@soulitzer,
I thought optimizer_D step and optimizer_d are optimizing different networks. Though they share the loss_gan.
I took the first optimizer step after optimizer D but still got the errors.
Could you give me more hints with some demo code on how to clone parameters?
I thought optimizer_D step and optimizer_d are optimizing different networks. Though they share the loss_gan .
Yes because the loss-gan part is shared, parameters (which are saved for backward) modified in-place by the first optimizer step will be used in the second backward. I’m not sure your code includes the part where the optimizers are defined (I doubt that that is relevant though)
Could you give me more hints with some demo code on how to clone parameters?
I took the first optimizer step after optimizer D but still got the errors.
Even though the cloning should fix this issue, I’d suggest looking into this error more to see if we can avoid the extra overhead of cloning. Was this the same error? Could you post the stack trace. (Or if possible, post a short runnable snippet to demonstrate the issue)
RuntimeError Traceback (most recent call last)
<ipython-input-8-cdfe813ffd22> in <module>
98 loss_dec = gamma * rec_loss - loss_gan
99 optimizer_d.zero_grad()
--> 100 loss_dec.backward(retain_graph=True)
101 optimizer_d.step()
102 # +----------------------------+
1 frames
/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
173 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
174 tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 175 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
176
177 def grad(
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
Ahhh I also just realized that I wrote the comment above, I didn’t see that you had three losses to backward in total, you’ll need to make sure all the optimizer updates are done at the very end.
So, modifying the code in the a way is probably the preferred solution here: