Derivative with respect to the input

Hello! I want to calculate the derivatives (actually Jacobian) of a NN with respect to its input. Usually I do something like this:

torch.autograd.grad(y, x, create_graph=True)[0]

But in this case x doesn’t have the “require grad” property (it is the input to the network so it should be fixed). How can I calculate the derivative in this case? Thank you!

you could set the requires_grad property of x to get the gradient w.r.t x?

x.requires_grad = True

But if I do this, won’t the x itself be change when I backpropagate through the NN?

torch.augograd.grad() just returns the gradient. It doesn’t modify the variables/store grad in .grad, as far as I know. You need to set x.requires_grad to indicate the autograd.grad() function that x is a leaf variable.


Yes but I want to add this derivative I am calculating (using autograd.grad()) to the final loss function. Won’t x be modified when I call loss.backward(), because x has the requires_grad parameter associated to it?

I am not sure if I follow correctly. I think, adding this grad to loss function will not have any effect. Don’t you think so?

So what I need is to calculate the derivative the the NN with respect to the input, something like: dx = torch.augograd.grad(NN(x),x) and then add this derivative to the loss and backpropagate. For example loss = (NN(x)-x)**2+abs(dx) and then loss.backward(). Won’t the value of x itself be changed if I have x.requires_grad? For example if x represent the pixels of an image, won’t the value of the pixels themselves be affected i.e. the image will change?

Gradient is calculated when there is a computation graph. For example,

x --> linear(w, x) --> softmax(). Here, x, w could be potentially leaf nodes that require gradient.

In this same paradigm, when you add dx to loss function, it is just like you are adding a constant to the loss function. The weights of the NN doesn’t depend on the gradient and this dx doesn’t have any computation graph associated with it. So, I am not sure if x will be affected by this loss function at all.

Hi, I have exactly the same question. Did you figure out whether adding input x with require “grad True” will influence the backward propagation? Thank you very much.

If your input requires gradient, its .grad attribute will be populated after the backward pass, but besides that it won’t change the gradient calculation.

Thank you very much. It is very helpful.

Can this be used for pretrained model present in torchvision.models

It seems that you can’t get the gradient through derivative as of pytorch 1.5:

import torch

x = torch.Tensor([1, 2, 3])
c = x ** 2
print(c.grad_fn)  # c has gradient function
dc_dx = torch.autograd.grad(c.mean(), x)[0]
print(dc_dx.grad_fn)  # derivative of c over x doesn't have gradient function
dc_dx.backward()  # raises RuntimeError