Here is the situation I encounter

f = lambda x:x*alpha**2
y = f(torch.Tensor([1,2,3]))
y

We will have y = tensor([ 16., 32., 48.]). Now, I do derivative

shape = torch.ones(y.size()) # y.size()=3 because we have three output here
y.backward(shape)

Now since we take gradient w.r.t. alpha. We have df = 2 * x * alpha

Now when I call

I’m expecting results tensor([ 8., 16., 32.])

tensor([ 48.])

I couldn’t figure out why the output is like this; is there anyway to get what I expect? Thanks

Did you mean you were expecting the result to be `tensor([ 8., 16., 24.])`?

The gradient of alpha is going to be the same size as alpha. `alpha` was expanded to `tensor([4., 4., 4.])` for the computation, and so the values of `8, 16, 24` were summed to get the complete gradient.

To get what you expected, you can do the following:

``````alpha = Variable(torch.Tensor([4, 4, 4]),requires_grad=True)
f = lambda x:x*alpha**2
y = f(torch.Tensor([1,2,3]))
shape = torch.ones(y.size()) # y.size()=3 because we have three output here
y.backward(shape)