Hi,

I am working on adding batchnorm in the discriminator in WGAN-GP. However, I encountered a bug where **gpu memory continues to increase** when using **batchnorm double backprop**. This bug only occurs when using batchnorm. If i remove batchnorm from the model, the bug doesn’t occur.

Here’s the code you can experiment with.

```
import torch
import torch.nn as nn
from torch.autograd import Variable
# Model (partial discriminator)
D = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=4, stride=2, padding=1),
nn.BatchNorm2d(64), # if you remove this, the bug does not occur.
nn.LeakyReLU(0.2))
D.cuda()
for i in range(1000):
# Input
x = Variable(torch.randn(10, 3, 128, 128).cuda(), requires_grad=True)
out = D(x)
grad = torch.autograd.grad(outputs=out,
inputs=x,
grad_outputs=torch.ones(out.size()).cuda(),
retain_graph=True,
create_graph=True,
only_inputs=True)[0]
grad_norm = grad.pow(2).sum().sqrt()
loss = torch.mean((grad_norm - 1)**2)
# Reset grad and backprop
D.zero_grad()
loss.backward()
if (i+1) % 10 == 0:
print (i+1)
```

How can i solve this problem?

Thanks,