Improved WGAN Scatter is not differentiable twice

I get Scatter is not differentiable twice when trying to backward the gradient penalty suggested in the WGAN paper. I’m running the latest version of pytorch and the models are using DataParallel!

    # discriminator
    d_real_out = disc(mixed, clean)
    d_fake = disc(fake)
    loss_real, loss_fake = torch.mean(d_fake), -torch.mean(d_real_out)
    grad_penalty = gradient_penalty(disc, mixed, clean, fake)

    def compute_gradient_penalty(discriminator, mixed, real, fake, LAMBDA=10,
    alpha = torch.rand(real.size(0), 1)
    alpha = alpha.expand(real.size())
    alpha = alpha.contiguous()
    alpha = alpha.cuda() if use_cuda else alpha
    interpolates = alpha * + ((1 - alpha) *

    if use_cuda:
        interpolates = interpolates.cuda()
    interpolates = Variable(interpolates, requires_grad=True)
    disc_interpolates = discriminator(mixed, interpolates)

    gradients = torch.autograd.grad(
        outputs=disc_interpolates, inputs=interpolates,
        grad_outputs=torch.ones(disc_interpolates.size()).cuda() if use_cuda else torch.ones(disc_interpolates.size()),
        create_graph=True, retain_graph=True, only_inputs=True)[0]
    penalty = ((gradients.norm(2, dim=1) - 1) ** 2).mean() * LAMBDA
    return penalty

Could you post your D architecture as well? Thanks :slight_smile:. I recently tried gp-WGAN and it worked although it was not on most recent PyTorch.

Certainly! The discriminator takes two inputs and combines concatenates their channel dimension. It is a sequence of Convolutions with a Linear layer at the end.
There has been a similar question here:

I’m also running my models in DataParallel and can explicitly post the code if need be.

@gchanan mentioned in that thread that this is now implemented in master. Could you try update? Building from source is actually quite easy!

I’m running version 0.2.0. is this the same version as the one on master?

The master is the current actively developed branch. So it is newer than 0.2.0. The patch is after 0.2.0 so it is only in master right now. Sorry for letting you have to build from source to solve this.

Yes. The patch does work for me and the same code runs on pytorch’s code in the master branch!
Thank you!


Note for future visitors: My setup was bleeding edge pytorch with CUDA9.

I have a very similar problem, in my case I use an RNN and then I get
"CudnnRNNLegacyBackward is not differentiable twice"

I would modify things myself just to make it be differentiable the way I want (if needed). How can I overcome this?
Any help would be awesome. I use GPU, Cuda 8, and did not compile from scratch. (I do not use nn.parallel)
I appreciate any help!:grinning::grinning::grinning:

Are you using a legacy module? Legacy modules likely do not have double backwards defined. Could you just switch to normal rnns?

Great idea!
My question now is of a nb:
Do you mean to compile pytorch from the current master brach (bleeding version)
Otherwise, how can I do it?

Nevermind, upon second thought, I realized non-legacy RNN modules don’t have double backward either. Btw, I wasn’t talking about compiling from source. The non-legacy RNN modules should be in your release, assuming it is not too old.

After reading a little and trying running it w/o cuda is does seem to backprop twice on a cpu. I suspect there is a bug there due to experiencing a different behavior so I reported on it pytorch github issues. I hope they would be able to address it :crossed_fingers:t2::crossed_fingers:t2::crossed_fingers:t2:

I replied to you on github.