Why do we need to provide a vector as input for backward() when calculating gradient of vector-valued output?

oat · December 23, 2020, 2:08pm

In the PyTorch beginner tutorial, why do we need to specify a vector as input for the backward() function to calculate the gradient/derivative of a vector-valued output y with respect to a vector input variable x, as shown below?

It seems that in the example below the vector v will “scale” the output of x.grad. So, why is this the case? Can we just specify v as a “dummy vector” composed of all ones [1.0, 1.0, 1.0] to avoid the “scaling”?

x = torch.randn(3, requires_grad=True)
y = x * 2

while y.data.norm() < 1000:
    y = y * 2

v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)

oat · December 24, 2020, 4:53am

Let me try to answer myself. Found the following post which explains why:

andreshmo · May 24, 2023, 9:49am

Had same doutb. So then you do not need to scale? Why does tutorial do it then?