Derivative of model outputs w.r.t input features

KFrank · September 10, 2020, 7:15pm

Hello Ömer!

Just as you use pytorch’s autograd to calculate the derivatives (gradient)
of your loss function with respect to your model’s parameters (and then
use those to update your model with gradient descent), you can use
autograd to calculate the derivatives of a prediction for a single class
with respect to the input to your model. This will be a single column of
your Jacobian matrix.

You can then loop over predictions / columns to build the full Jacobian.

Models work on batches, even if you want to process a single image.
So, for a single image, you need a batch with batch size of one.

Let’s say you have an input tensor, with shape [1, 784]. (It could be
[1, 28, 28], if that is what your model expects.) You say your model
has ten classes, so you will have:

preds = model (input)

where preds has shape [nBatch, nClass] = [1, 10].

Tell pytorch to track gradients with respect to your input, apply your
model to input, and call .backward() on your preds[i], looping over
i. The .grad property of of your input tensor will be the ith column
of the Jacobian.

J = torch.zeros ((1, 784, 10))   # loop will fill in Jacobian
input.requires_grad = True
preds = model (input)
for  i in range (10):
    grd = torch.zeros ((1, 10))   # same shape as preds
    grd[0, i] = 1    # column of Jacobian to compute
    preds.backward (gradient = grd, retain_graph = True)
    J[:,:,i] = input.grad   # fill in one column of Jacobian
    input.grad.zero_()   # .backward() accumulates gradients, so reset to zero

You could also try pytorch’s experimental jacobian() function, which
I think basically wraps the loop I outlined above, but with more bells
and whistles (but I’ve never used it).

Good luck.

K. Frank