An easy question about autograd

Hi, I am a newbie to PyTorch and have a question about autograd. I am following a tutorial like this to calculate the gradients. :

x = torch.randn(2,2, requires_grad=True)
y = x**2
z = y.mean()

But when I run:


Error appears! So why is it not allowed to use .backward() on tensor y


The error message states that you should provide it with the grad output because it is not a scalar right?

The thing is that autograd computes reverse mode AD, which basically performs a matrix product between a given vector v and the Jacobian of the function.
This vector v is what backward expects as input.
For the particular case where your function outputs a scalar value, then v is of size 1 and if it contains the value 1, then what you get is the Jacobian of your function (derivatives).
For the case where there is more outputs (in your case, your y is of size > 1), depending on v, you will get a weighted sum of the rows of the Jacobian. Unfortunately, there is no natural default to use in this case and so we require the user to provide v.

1 Like

Thanks for your helpful detailed explanation! :grinning: