Need help computing gradient of the output with respect to the input

KFrank · May 10, 2022, 5:59pm

Hi Malfonsoarquimea!

Here is what is probably going on:

If your model:

outputs=model(preprocessing(inputs))

maps a batch of inputs, say of shape [nBatch, N], to a batch of scalar
outputs of shape [nBatch] (or similar), and each batch element of
outputs depends only on the corresponding element of inputs (that is,
outputs[i] depends only on inputs[i], and not on inputs[j != i]),
then the gradient of the scalar value, outputs.sum() with respect to
inputs[i] will, in fact, be the gradient of outputs[i] with respect to
inputs[i].

All of the zeros you get when you compute the gradient of outputs[i]
with respect to all of the elements of inputs are to be expected.
inputs.grad[i] is non-zero, as outputs[i] depends on inputs[i],
while all of the other inputs.grad[j != i] are zero because (by my
assumption) outputs[i] does not depend on inputs[j != i] – no
dependence, so zero gradient.

The last step, reiterating what I said in my previous post, is that:

torch.autograd.grad (outputs = outputs, inputs = inputs, grad_outputs = torch.ones_like (outputs))

is equal to:

torch.autograd.grad (outputs = outputs.sum(), inputs = inputs, grad_outputs = None)

In short, if your use case only requires that you compute the gradients
of a batch of scalar outputs with respect to a batch of inputs where each
output batch element only depends on the corresponding input batch
element, then you do only need one call to torch.autograd.grad() (or,
if you prefer, outputs.sum().backward()), and you only backpropagate
through the computation graph once.

Best.

K. Frank