Confusion on backward gradient

Here is the situation I encounter

alpha = Variable(torch.Tensor([4]),requires_grad=True)
f = lambda x:x*alpha**2
y = f(torch.Tensor([1,2,3]))
y

We will have y = tensor([ 16., 32., 48.]). Now, I do derivative

shape = torch.ones(y.size()) # y.size()=3 because we have three output here
y.backward(shape)

Now since we take gradient w.r.t. alpha. We have df = 2 * x * alpha

Now when I call

alpha.grad

I’m expecting results tensor([ 8., 16., 32.])

But instead I get

tensor([ 48.])

I couldn’t figure out why the output is like this; is there anyway to get what I expect? Thanks

Did you mean you were expecting the result to be tensor([ 8., 16., 24.])?

The gradient of alpha is going to be the same size as alpha. alpha was expanded to tensor([4., 4., 4.]) for the computation, and so the values of 8, 16, 24 were summed to get the complete gradient.

To get what you expected, you can do the following:

alpha = Variable(torch.Tensor([4, 4, 4]),requires_grad=True)
f = lambda x:x*alpha**2
y = f(torch.Tensor([1,2,3]))
shape = torch.ones(y.size()) # y.size()=3 because we have three output here
y.backward(shape)
alpha.grad