By default, say if a tensor x is used multiple times in a computation graph, and we do back-propagation on this graph, then x.grad stores a sum of the gradients coming from each outgoing edge of x. As an example,
import torch x = torch.ones((1), requires_grad = True) a = torch.ones((1), requires_grad = True) b = a + x c = b + x d = c + x d.backward() print(x.grad) // will be 3 (1+1+1)
However, my use-case requires that I need to fill x.grad with gradient coming from the branch d = c + x (meaning only d to x immediate edge). However, by default, the gradient will also flow from c to x and b to x as well.
Also note, that I cannot detach x while calculating b and c. My use-case involves an LSTM where I am calculating a loss and back-propagation at each time.
Hence, Is there a way to concatenate gradients for x, something like x.grad = [1,1,1] instead?