Recently I’ve worked on implementing a WGAN-GP (https://arxiv.org/pdf/1704.00028.pdf) by myself.

For those of you who are not familiar with WGAN-GP, it uses a gradient penalty to enforce the Lipschitz-1 constraint on the WGAN’s discriminators function instead of weight clipping.

This gradient penalty term is a function of the gradient of the discriminator’s output on some special kind of input with respect to that input. Thus, we are essentially taking a second derivative, for which we use the function autograd.grad.

My original problem was, that even though I pass the argument retain_graph=True to grad, when I call backward on the loss function I encounter the error ‘Trying to backward through the graph a second time…’.

From debugging, I learned that the error was caused by my model containing a residual layer, that preformed inplace ReLU. I removed the inplace operation and my code works, but I’m still not sure what exactly is the error that is happening and it really annoys me.

I have constructed a toy example that to my understating suffers from the same issue, can anyone please help me understand what exactly is the error occurring?

```
import torch
from torch import nn
from torchviz import make_dot
def double_backprop(inputs, net):
y = net(x).mean()
grad, = torch.autograd.grad(y, x, create_graph=True, retain_graph=True)
return grad.pow(2).mean() + y
class TestNet(nn.Module):
"""
A network for testing double backprop
"""
def __init__(self):
super(TestNet, self).__init__()
def forward(self, input):
output = input.transpose(1, 2)
output = nn.Conv1d(4, 100, 1)(output)
# If I remove either the second ReLU layer, or the inplace argument, this works.
output = nn.ReLU(True)(output)
output = nn.ReLU()(output)
output = output.view(-1, 500)
output = nn.Linear(500, 1)(output)
return output
model = TestNet()
x = torch.randn((64,50,4),requires_grad=True)
out = double_backprop(x, model)
out.backward()
# make_dot(out)
```