Update 2: Comparing with another person’s WGAN implementation I see that they need to have the interpolates’ require_grad=True
. When I do this as well, my code runs without error (though I don’t know yet for sure it’s correct. My question is then whether or not this makes sense to do. In one way, it may make sense, as the grad
function is designed to calculate the gradient on the inputs
, but the gradient penalty is actually going to be applied to the parameters of the discriminator. Do these discriminator parameters also receive gradient when running the grad
function below? And if so, why wouldn’t they just receive this gradient when requires_grad=False
on the interpolates? If anyone can clarify it would be very helpful, thank you!
Update: In PyTorch 0.3, the segmentation fault no longer occurs. Now I just get the first error in both cases.
I’m trying to build an improved WGAN using higher order gradients, but I’m running into a few errors.
Minimizing my code down as far as I can while still producing an error, I use:
examples = torch.from_numpy(...)
interpolates = Variable(examples)
interpolates_predictions = D(interpolates)
gradients = torch.autograd.grad(outputs=interpolates_predictions, inputs=interpolates,
grad_outputs=torch.ones(interpolates_predictions.size()),
create_graph=True, retain_graph=True, only_inputs=True)[0]
Which results in the error:
File ".../python3.6/site-packages/torch/autograd/__init__.py", line 153, in grad
inputs, only_inputs)
RuntimeError: One of the differentiated Variables appears to not have been used in the graph
Interestingly enough, if I use the full version my code, which includes labeled, unlabeled, and fake examples (semi-supervised WGAN), I end up having a segmentation fault:
z = torch.randn(batch_size, noise_size)
fake_examples = G(Variable(z))
alpha = Variable(torch.rand(3, batch_size, 1))
alpha = alpha / alpha.sum(0)
interpolates = alpha[0] * Variable(labeled_examples) + alpha[1] * Variable(unlabeled_examples) + alpha[2] * fake_examples.detach()
interpolates_predictions = D(interpolates)
gradients = torch.autograd.grad(outputs=interpolates_predictions, inputs=interpolates,
grad_outputs=torch.ones(interpolates_predictions.size()),
create_graph=True, retain_graph=True, only_inputs=True)[0]
leading to:
Segmentation fault: 11
Can anyone point out what I’m doing wrong? Or (possibly hinted at by the segmentation fault) is this a bug in my version PyTorch (0.2post3)?