Is it possible to use gradient w.r.t input in loss?

Thanks Alexis for the reply.

If I understand correctly, the following is not generally true,

if you want grad_input = target, then it is the same than: output = target*input + c, for any c.
that is the same than : (output-c)/input = target

provided that grad_input is the gradient of output w.r.t to the input, i.e.

grad_input = d{output}/d{input}.

Sorry for not making it clear.

Actually, If we backward (output-c)/input, we get

d{(output-c)/input}/d{weight} = [d{output}/d{weight}*input - (output-c)*d{input}/d{weight}] / (input*input).

Apparently, this is not

d(d{output}/d{input})/d{weight})

needed by the optimizer.

I don’t think there is work-around for the second derivative. So I’ll keep an eye on the topic you mentioned.