How to calculate the gradients of a and b if the function is not element-wise operation, i.e. each element in the vector Q is obtained by different calculations on different elements of a and b, e.g. Q[0] = 3a[0]**3 - b[0]**2, and Q[1] = a[1]**2 - 3b[1]**3 ?
I did the following test with one vector input x and one vector output y. However, the gradient of x is not able to be calculated.
I have a feeling this issue arises with y = torch.tensor([y1, y2]), as it takes 2 floats and puts them within a Tensor. I quickly ran your code but with torch.stack and the gradients are computed!
May I ask why y_a = torch.tensor([y1, y2]) didn’t create a torch tensor with a grad_fn attached, although it has the same shape and type as the one created by stacking y1 and y2 as you suggested, i,e. y_b = torch.stack([y1, y2]) as shown below?
I’m not 100% sure (it’d be best to get a dev’s opinion) but this is an educated guess, torch.Tensor takes in a list of floats and converts it into a pytorch array (called Tensor). So, I would assume they don’t implemented a grad_fn for it as what would be the gradient of putting elements in an array?
Yes this is exactly the right answer! This is a factory function to create Tensors based on numbers. So no gradient can flow back. You can use cat/stack to build a bigger Tensor based on smaller ones in a differentiable way.