Hi, I am trying to implement WGAN-GP (in a conditional probability setting). When I use inplace=True in the ReLU activation layers, I get the error “RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.” during my Critic training.

I have come across a similar post here in Freeing buffer strange behavior where replacing the inplace=False in the ReLU activation solved the problem. It “solved” my problem too but my model became very very slow and the model output values dropped drastically (I do not think they are correct).

At the time when the aforementioned post was asked, the poster was using pytorch 0.4.1. But I am using pytorch 1.1.0. Strange thing is this exact code was working fine for a few days and then this error started popping up. Could I please have some suggestions about what I can do or any leads to what might trigger this behaviour? Please let me know if I need to provide more information.

This is my critic training code-

# def train_disc_net(bx, by, gen_net, disc_net, disc_net_optmzr):

```
# Zeroing out Gradients
disc_net_optmzr.zero_grad()
gen_net.zero_grad()
disc_net.zero_grad()
# Reset requires_grad
for p in disc_net.parameters():
p.requires_grad = True
# Training sign convention
one = torch.ones(bx.shape[0],1).cuda()
neg_one = -1*torch.ones(bx.shape[0],1).cuda()
# True data
dval_true = disc_net(by, bx)
dval_true.backward(neg_one)
# Generated data
by_gen = gen_net(bx) # Generated data
dval_gen = disc_net(by_gen.detach(), bx)
dval_gen.backward(one)
# Wasserstein distance
was_dist = dval_true.mean() - dval_gen.mean()
Get drift regularization
d_drift_reg = dval_true**2 + dval_gen**2
d_drift_reg = 10e-9 * d_drift_reg.mean()
d_drift_reg.backward(one)
# Train with gradient regularization to ensure lipschitz 1 constraint
grad_reg = 10 * get_grad_reg(by, by_gen.detach(), bx, disc_net)
grad_reg.backward() <----- Giving the error
# Objective function
d_cost = -was_dist + grad_reg #+ d_drift_reg
# Update the networks
disc_net_optmzr.step()
return d_cost, was_dist, grad_reg
```

This is my lipschitz constraining code-

# def get_grad_reg(by, by_gen, bx, disc_net):

```
# Mixing real and fake inputs in a random fashion
epsilon = torch.FloatTensor(by.shape[0], 1, 1, 1, 1).uniform_(0.0, 1.0).cuda()
by_hat = epsilon * by + (1-epsilon) * by_gen
by_hat = torch.autograd.Variable(by_hat, requires_grad=True)
# Get output
d_hat = disc_net(by_hat, bx.detach())
# I concatenate the inputs of disc_net inside the disc_net function and I want to compute the gradients with respect to this concatenated input
d_in = disc_net.x_xyt
# Getting gradient regularization
grad = torch.autograd.grad(outputs=d_hat, inputs=d_in, grad_outputs=torch.ones(d_hat.size()).cuda(), retain_graph=True, create_graph=True)[0]
grad_norm = torch.sqrt(1e-8+torch.sum(grad**2, dim=(1,2, 3, 4)))
one = torch.ones(grad_norm.shape).cuda()
grad_reg = (grad_norm-one)**2
grad_reg = grad_reg.mean()
return grad_reg
```