Why do we need to provide a vector as input for backward() when calculating gradient of vector-valued output?

In the PyTorch beginner tutorial, why do we need to specify a vector as input for the backward() function to calculate the gradient/derivative of a vector-valued output y with respect to a vector input variable x, as shown below?

It seems that in the example below the vector v will “scale” the output of x.grad. So, why is this the case? Can we just specify v as a “dummy vector” composed of all ones [1.0, 1.0, 1.0] to avoid the “scaling”?

x = torch.randn(3, requires_grad=True)
y = x * 2

while y.data.norm() < 1000:
    y = y * 2

v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)

Let me try to answer myself. Found the following post which explains why:

1 Like