I have a element-wise function, for example (let’s call it f):

y = torch.sqrt(x ** 2 + 1)

can I backprob through the derivative of f, so that I can use f(x) and f’(x) for training?

I have a element-wise function, for example (let’s call it f):

y = torch.sqrt(x ** 2 + 1)

can I backprob through the derivative of f, so that I can use f(x) and f’(x) for training?

You can, it should be something like:

```
y = torch.sqrt(x**2 + 1)
grad_y = torch.autograd.grad(y, (x), create_graph=True)[0]
loss = y + grad_y
loss.backward()
```

Thanks, but while my function is element-wise, my output is not a scalar.

So i think `torch.autograd.grad`

requires me to supply `grad_outputs`

, but I don’t really understand the documentation.

`grad_outputs`

should be a sequence of length matching`output`

containing the pre-computed gradients w.r.t. each of the outputs.

But I don’t have the gradient yet! That’s what I want to use `torch.autograd.grad`

for.

EDIT: The input can be quite big (for example a full 256x256x3 image) ! The function is used as a normal activation function, so if `torch.autograd.grad`

computes the full jacobian, then this is not what I want. I need an element-wise derivative for the element-wise acting function.