# Optimising derivative of function

I have a element-wise function, for example (let’s call it f):

y = torch.sqrt(x ** 2 + 1)

can I backprob through the derivative of f, so that I can use f(x) and f’(x) for training?

You can, it should be something like:

``````y = torch.sqrt(x**2 + 1)
So i think `torch.autograd.grad` requires me to supply `grad_outputs`, but I don’t really understand the documentation.
`grad_outputs` should be a sequence of length matching `output` containing the pre-computed gradients w.r.t. each of the outputs.
But I don’t have the gradient yet! That’s what I want to use `torch.autograd.grad` for.
EDIT: The input can be quite big (for example a full 256x256x3 image) ! The function is used as a normal activation function, so if `torch.autograd.grad` computes the full jacobian, then this is not what I want. I need an element-wise derivative for the element-wise acting function.