How to implement the gradient penalty of WGAN?

I’m trying to implement the WGAN using PyTorch.

I’ve found there is a way to do that:

        prob =self.D(input_image)
        # calculate ∂D(input_image) / ∂input_image
        grad = torch.autograd.grad(outputs=prob , inputs=input_image,
                                               grad_outputs=torch.ones(prob .size()).cuda(),
                                               create_graph=True, retain_graph=True)[0]

But the code version is very old and I’m not sure if it’s the correct way to do it. Anyone knows how to implement the GP elegantly using latest version? Thanks in advance.

I don’t think that torch.autograd.grad was changed recently, so the call still looks alright.
Are you seeing any issues with it?

Not yet. But I’ve noticed that the doc ( says about the parameter create_graph “Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way”, so I’m wandering if there’s a better way to do it. :smile:

Do you see an error, if you don’t create it?
This is usually needed to work with higher derivatives, so it might not be necessary for your use case.