Higher order gradient - Runtime Error: One of the differentiated Variables appears to not have been used in the graph

Update 2: Comparing with another person’s WGAN implementation I see that they need to have the interpolates’ require_grad=True. When I do this as well, my code runs without error (though I don’t know yet for sure it’s correct. My question is then whether or not this makes sense to do. In one way, it may make sense, as the grad function is designed to calculate the gradient on the inputs, but the gradient penalty is actually going to be applied to the parameters of the discriminator. Do these discriminator parameters also receive gradient when running the grad function below? And if so, why wouldn’t they just receive this gradient when requires_grad=False on the interpolates? If anyone can clarify it would be very helpful, thank you!

Update: In PyTorch 0.3, the segmentation fault no longer occurs. Now I just get the first error in both cases.

I’m trying to build an improved WGAN using higher order gradients, but I’m running into a few errors.

Minimizing my code down as far as I can while still producing an error, I use:

examples = torch.from_numpy(...)
interpolates = Variable(examples)
interpolates_predictions = D(interpolates)
gradients = torch.autograd.grad(outputs=interpolates_predictions, inputs=interpolates,
                                grad_outputs=torch.ones(interpolates_predictions.size()),
                                create_graph=True, retain_graph=True, only_inputs=True)[0]

Which results in the error:

File ".../python3.6/site-packages/torch/autograd/__init__.py", line 153, in grad
    inputs, only_inputs)
RuntimeError: One of the differentiated Variables appears to not have been used in the graph

Interestingly enough, if I use the full version my code, which includes labeled, unlabeled, and fake examples (semi-supervised WGAN), I end up having a segmentation fault:

z = torch.randn(batch_size, noise_size)
fake_examples = G(Variable(z))
alpha = Variable(torch.rand(3, batch_size, 1))
alpha = alpha / alpha.sum(0)
interpolates = alpha[0] * Variable(labeled_examples) + alpha[1] * Variable(unlabeled_examples) + alpha[2] * fake_examples.detach()
interpolates_predictions = D(interpolates)
gradients = torch.autograd.grad(outputs=interpolates_predictions, inputs=interpolates,
                                grad_outputs=torch.ones(interpolates_predictions.size()),
                                create_graph=True, retain_graph=True, only_inputs=True)[0]

leading to:

Segmentation fault: 11

Can anyone point out what I’m doing wrong? Or (possibly hinted at by the segmentation fault) is this a bug in my version PyTorch (0.2post3)?

The first error can be solved by setting allow_unused=True