Optimising derivative of function

I have a element-wise function, for example (let’s call it f):

y = torch.sqrt(x ** 2 + 1)

can I backprob through the derivative of f, so that I can use f(x) and f’(x) for training?

You can, it should be something like:

y = torch.sqrt(x**2 + 1)
grad_y = torch.autograd.grad(y, (x), create_graph=True)[0]
loss = y + grad_y
loss.backward()

Thanks, but while my function is element-wise, my output is not a scalar.
So i think torch.autograd.grad requires me to supply grad_outputs, but I don’t really understand the documentation.

grad_outputs should be a sequence of length matching output containing the pre-computed gradients w.r.t. each of the outputs.

But I don’t have the gradient yet! That’s what I want to use torch.autograd.grad for.

EDIT: The input can be quite big (for example a full 256x256x3 image) ! The function is used as a normal activation function, so if torch.autograd.grad computes the full jacobian, then this is not what I want. I need an element-wise derivative for the element-wise acting function.