# Loss function contains gradient w.r.t. input variables

I am trying to implement a model in this article, where the loss function contains gradient w.r.t. to the input data x. Basically we want the gradient of the NN to approximate a certain function.

In HIPS/autograd, it would be something like

1. Define the forward path, with input data `x` and weights `w`:
`f(x; w) = ...`
2. Take gradient w.r.t. to `x`:
`dfdx = grad(f, x)`
3. Use such gradient to construct the loss function. We want `dfdx` to be close enough to our desired `dfdx_true`, so:
`loss = sum_of_error(dfdx, dfdx_true)`
4. Take gradient of the loss w.r.t. to `w`, just like in any neural network
`lossgradient = grad(loss, w)`
5. Use any normal training method, for exmple
`w - lr*lossgradient(w, x_data)`

In pytorch, before I call `loss.backward()`, how can I let `loss` contain the gradient w.r.t. `x`? Notice that the final result would be a mixed derivative to `x` and `w`, not a second-order derivative to `w`.

In pytorch, the corresponding functions are:

1. `f(x; w) = ...`
2. `dfdx = autograd.grad(f, x, df, create_graph=True)`
3. `loss = sum_of_error(dfdx, dfdx_true)`
4. `optimizer.zero_grad()` then `loss.backward()`
5. `optimizer.ste()`
1 Like

What is df in this context? Thanks

1 Like

Is there a way of saving dfdx as a function and just evaluate it for given inputs x?

I think, if `f(x)` is a linear function of `x`, it would be possible. However, if there are multi-level hidden features with non-linearities involved, backpropagation needs intermediate feature values to perform a backward pass. Thus, a forward pass is needed. Don’t you think?