Suppose I have a model F(x)=y, which means x is the input and y is the output, what’s the best way to get the gradient ∂y / ∂x ? Note that the gradient still needs to be part of its computational graph (so that any operations on the gradient can affect other variables in the computational graph, which is exactly what the “gradient penalty” (GP) shown in paper “Improved Training of Wasserstein GANs” does).

I’ve found there is a way to do that:

```
prob =self.D(input_image)
# calculate ∂D(input_image) / ∂input_image
grad = torch.autograd.grad(outputs=prob , inputs=input_image,
grad_outputs=torch.ones(prob .size()).cuda(),
create_graph=True, retain_graph=True)[0]
```

But this code is too old and not elegant enough, and I doubt its correctness in current PyTorch. Is there a better way to do this? Thanks in advance.