Error using autograd.grad with 2 directly following Conv2d layers

While implementing an Improved Wasserstein GAN architecture, we ran into a problem when consecutively adding 2 Conv2d layers before the final classification layer, which results in the following Error:

/home/hartmank/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.pyc in backward(self, gradient, retain_graph, create_graph, retain_variables)
150                 Defaults to False, unless ``gradient`` is a volatile Variable.
151         """
--> 152         torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
153 
154     def register_hook(self, hook):

/home/hartmank/anaconda2/lib/python2.7/site-packages/torch/autograd/__init__.pyc in backward(variables, grad_variables, retain_graph, create_graph, retain_variables)
 96 
 97     Variable._execution_engine.run_backward(
---> 98         variables, grad_variables, retain_graph)
 99 
100 

RuntimeError: ConvNdBackward: expected Variable at argument 0 (got None)

The error vanished when the final classification output is transformed by 1/x.
We tested the following architectures:
Conv2d - LeakyReLU - Linear -> No Error
Conv2d - Conv2d - LeakyReLU - Linear -> Error
Conv2d - Conv2d - LeakyReLU - Conv2d - LeakyReLU - Linear -> No Error
Conv2d - Conv2d - LeakyReLU - Conv2d - Conv2d - LeakyReLU - Linear -> Error
Conv2d - Conv2d - LeakyReLU - Conv2d - Conv2d - LeakyReLU - Linear - 1/x -> No Error

Notebook with Code: https://gist.github.com/kahartma/ccf7279fb20a216905079f6b10b31b78

Does anybody have any idea what’s going on or what we do wrong?

Cheers
Kay

Hi,

The problem is in how we handle elements that should return 0 gradients in cpp.
As a workaround, you can modify your calc_gradient_penalty with:

def calc_gradient_penalty(self, real_data, fake_data,lambd):
    alpha = torch.rand(real_data.size(0), 1,1,1)
    alpha = alpha.expand(real_data.size())
    alpha = alpha

    interpolates = alpha * real_data.data + ((1 - alpha) * fake_data.data)

    interpolates = interpolates
    interpolates = Variable(interpolates, requires_grad=True)

    disc_interpolates = self(interpolates)

    inputs = [interpolates] + list(self.parameters())

    gradients = autograd.grad(outputs=disc_interpolates, inputs=inputs,
                              grad_outputs=torch.ones(disc_interpolates.size()),
                              create_graph=True, retain_graph=True, only_inputs=True)

    gradient_penalty = ((gradients[0].norm(2, dim=1) - 1) ** 2).mean() * lambd
    for grad in gradients[1:]:
        gradient_penalty += 0 * grad.view(-1)[0]

    return gradient_penalty

This basically make sure that all gradients exist (even though they will be 0).

The issue tracking this problem is #2003.

1 Like