In the PyTorch beginner tutorial, why do we need to specify a vector as input for the backward() function to calculate the gradient/derivative of a vector-valued output y with respect to a vector input variable x, as shown below?
It seems that in the example below the vector v will “scale” the output of x.grad. So, why is this the case? Can we just specify v as a “dummy vector” composed of all ones [1.0, 1.0, 1.0] to avoid the “scaling”?
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)