Gradient w.r.t inputs

feribg · March 8, 2020, 6:31pm

I currently have a model that outputs a single regression target with mse loss. I can get the derivatives with respect to the inputs like so

x = x.requires_grad_(True).cuda()
output = model.eval()(None, x)
output[0].backward()
x.grad[0]

However this works one element at a time, if x is a batch, then

x = x.requires_grad_(True).cuda()
output = model.eval()(None, x)
output.backward()
x.grad[0]

fails with a runtime error that implicit derivatives are only supported for scalar outputs. I have 2 questions:

How to get a batch of target + it’s derivatives wrt to inputs?
How to get higher order derivates wrt to inputs?

Important to note that if I aggregate the loss then that works, but I need the specific input <> loss pair derivatives not the averaged one.

feribg · March 8, 2020, 8:26pm

Here seems to be an example of that in tensorflow, what would the Pytorch equiv be ?

github.com

google/tf-quant-finance/blob/ed17cb35e263dcb4d27339b5c88cae3f2129ca7d/tf_quant_finance/math/gradient.py#L24


# limitations under the License.
"""Helper functions for computing gradients."""




import functools


import tensorflow.compat.v2 as tf
from tensorflow_probability.python.math import value_and_gradient




def fwd_gradient(func, x, input_gradients=None, use_gradient_tape=False):
  """Computes forward mode gradient.


  Implementation based on suggestions in
  [this thread](https://github.com/tensorflow/tensorflow/issues/19361).


  TensorFlow computes gradients using the reverse mode automatic
  differentiation which is suitable for typical machine learning situations
  where one has a scalar loss function that one wants to differentiate with
  respect to the parameters. In some cases, one needs to be able to compute
  directional derivatives of non-scalar functions. Suppose F is a function from

def fwd_gradient(func, x, input_gradients=None, use_gradient_tape=False):