A Bug in batchnorm double backprop

Hi,

I am working on adding batchnorm in the discriminator in WGAN-GP. However, I encountered a bug where gpu memory continues to increase when using batchnorm double backprop. This bug only occurs when using batchnorm. If i remove batchnorm from the model, the bug doesn’t occur.

Here’s the code you can experiment with.

import torch
import torch.nn as nn
from torch.autograd import Variable

# Model (partial discriminator)
D = nn.Sequential(
    nn.Conv2d(3, 64, kernel_size=4, stride=2, padding=1),
    nn.BatchNorm2d(64),   # if you remove this, the bug does not occur.
    nn.LeakyReLU(0.2))

D.cuda()


for i in range(1000):
    # Input 
    x = Variable(torch.randn(10, 3, 128, 128).cuda(), requires_grad=True)
    out = D(x)
    
    grad = torch.autograd.grad(outputs=out,
                               inputs=x,
                               grad_outputs=torch.ones(out.size()).cuda(),
                               retain_graph=True,
                               create_graph=True,
                               only_inputs=True)[0]
    
    grad_norm = grad.pow(2).sum().sqrt()
    loss = torch.mean((grad_norm - 1)**2)

    # Reset grad and backprop
    D.zero_grad()
    loss.backward()
    
    if (i+1) % 10 == 0:
        print (i+1)

How can i solve this problem?
Thanks,

Gregory Chanan is investigating this. He’ll follow up more on your github issue tomorrow.

@smth Ok, Thanks.

I left a link to the github issue.

I’m experiencing the same problem! Thanks for investigating.