In the PyTorch beginner tutorial, why do we need to specify a vector as input for the backward() function to calculate the gradient/derivative of a vector-valued output y with respect to a vector input variable x, as shown below?
It seems that in the example below the vector v will “scale” the output of x.grad. So, why is this the case? Can we just specify v as a “dummy vector” composed of all ones [1.0, 1.0, 1.0] to avoid the “scaling”?
x = torch.randn(3, requires_grad=True) y = x * 2 while y.data.norm() < 1000: y = y * 2 v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float) y.backward(v) print(x.grad)