# How copied parameters updated?

W is a parameter tensor, and A = W[[1,1]] by indexing operator. So the two elements of A are copied from same source. Are the gradients of A and A computed independently? If yes, the computation is double.

Yes the gradients for A and A will be done independently and the gradient for W will be the sum of the two. This is the gradient of the function you implement.

For A = W[[1,1]] ,could I make A, A and W share same gradient and gradient function, or even share same storage?

I’m not sure what you mean by that. Could you write a code sample and what you expect to find in the `.grad` fields?

Here is pseudocode

for input (X, Y)

``````W = nn.Parameter()
index = [1,1,1,2,3]
output = X*W[index]
loss = function(output, Y)
loss.backward()
``````

In above case, parameter W has three copies whose gradients will be computed independently as you say. I hope the gradient of W just computed once in loss.backward(). W's copies don’t need to compute the gradients, which can share the same gradient with W . Otherwise the backward gets slow.

Here is pseudocode

for input (X, Y)

``````W = nn.Parameter()
index = [1,1,1,2,3]
output = X*W[index]
loss = function(output, Y)
loss.backward()
``````

In above case, parameter W has three copies whose gradients will be computed independently as you say. I hope the gradient of W just computed once in loss.backward(). W's copies don’t need to compute the gradients, which can share the same gradient with W . Otherwise the backward gets slow.

W is computed once. But it will contain the sum of the gradients for each of the place where it is used.
If you have W that contains 3 values.
O contains [W, W, w, W, W].
And your loss is `sum(O)`.
Then when you call backward on this loss, the gradient of O wrt this loss is [1, 1, 1, 1, 1]. And the gradient of W wrt this loss is [3, 1, 1].
So yes the gradient of W is computed once.

Thanks for clear explanation. I understand W is computed once now.

For O which contains [W, W, w, W, W], the gradients of O,O,O are computed independently, although they have same value. I think the computation is wasted. Could gradients of O,O,O, W wrt loss be the same? I hope that just one of gradients of O,O,O,W are computed and the others share it.

Thanks for clear explanation. I understand W is computed once now. For O which contains [W, W, w, W, W], the gradients of O,O,O are computed independently, although they have same value. I think the computation is wasted. Could gradients of O,O,O, W wrt loss be the same? I hope that just one of gradients of O,O,O,W are computed and the others share it.

They are the same in my example but could be different.
With the same O, if my loss is: `loss = 2*O + 3*O + 4*O + 5*O + 6*O`
Then the gradients of O wrt the loss would be `[2, 3, 4, 5, 6]` and the gradient of W [`9, 5, 6]`.