How torch calculates the grads for the scalar and non-scalar tensors?

AbishekBashyal · March 26, 2021, 12:39pm

I was reading the Optional Reading: Tensor Gradients and Jacobian Products section of this blog and it stated:

In many cases, we have a scalar loss function, and we need to compute the gradient with respect to some parameters. However, there are cases when the output function is an arbitrary tensor. In this case, PyTorch allows you to compute the so-called Jacobian product and not the actual gradient.

So, does it means that Jacobian product is calculated only for the arbitrary tensor i.e. non-scalar tensor & scalar tensors gradients are calculated in different way?

albanD · March 26, 2021, 1:12pm

Hi,

The idea is that it always does a vector Jacobian product. It just happens that when the output is scalar, it is 1D and so the vector is of size 1 and can be replaced with just the value 1. that will give you the full jacobian (and thus gradients).

AbishekBashyal · March 26, 2021, 1:33pm

@albanD thanks it makes sense now