How torch calculates the grads for the scalar and non-scalar tensors?

I was reading the Optional Reading: Tensor Gradients and Jacobian Products section of this blog and it stated:

In many cases, we have a scalar loss function, and we need to compute the gradient with respect to some parameters. However, there are cases when the output function is an arbitrary tensor. In this case, PyTorch allows you to compute the so-called Jacobian product and not the actual gradient.

So, does it means that Jacobian product is calculated only for the arbitrary tensor i.e. non-scalar tensor & scalar tensors gradients are calculated in different way?


The idea is that it always does a vector Jacobian product. It just happens that when the output is scalar, it is 1D and so the vector is of size 1 and can be replaced with just the value 1. that will give you the full jacobian (and thus gradients).

1 Like

@albanD thanks it makes sense now