Gradient of a function of non-element-wise operation

oat · April 4, 2021, 3:22pm

In the PyTorch official intro on autograd, Q is a vector output of element-wise operation of a function on two vectors a and b.

How to calculate the gradients of a and b if the function is not element-wise operation, i.e. each element in the vector Q is obtained by different calculations on different elements of a and b, e.g. Q[0] = 3a[0]**3 - b[0]**2, and Q[1] = a[1]**2 - 3b[1]**3 ?

I did the following test with one vector input x and one vector output y. However, the gradient of x is not able to be calculated.

import torch 

x = torch.tensor([1., 2.], requires_grad=True)

y1 = 2*x[0]**2 + x[1]
y2 = 3*x[0] + 4*x[1]**3
y = torch.tensor([y1, y2])

external_grad = torch.tensor([1., 1.])
y.backward(gradient=external_grad)

x.grad

Error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

AlphaBetaGamma96 · April 4, 2021, 3:35pm

Hey,

I have a feeling this issue arises with y = torch.tensor([y1, y2]), as it takes 2 floats and puts them within a Tensor. I quickly ran your code but with torch.stack and the gradients are computed!

import torch 

x = torch.tensor([1., 2.], requires_grad=True)

y1 = 2*x[0]**2 + x[1]
y2 = 3*x[0] + 4*x[1]**3
y = torch.stack([y1, y2])

y.backward(torch.ones_like(y))

print(x.grad) #returns tensor([ 7., 49.])

oat · April 5, 2021, 5:00am

Thank you very much, @AlphaBetaGamma96

May I ask why y_a = torch.tensor([y1, y2]) didn’t create a torch tensor with a grad_fn attached, although it has the same shape and type as the one created by stacking y1 and y2 as you suggested, i,e. y_b = torch.stack([y1, y2]) as shown below?

AlphaBetaGamma96 · April 5, 2021, 9:28am

I’m not 100% sure (it’d be best to get a dev’s opinion) but this is an educated guess, torch.Tensor takes in a list of floats and converts it into a pytorch array (called Tensor). So, I would assume they don’t implemented a grad_fn for it as what would be the gradient of putting elements in an array?

albanD · April 5, 2021, 1:21pm

Yes this is exactly the right answer! This is a factory function to create Tensors based on numbers. So no gradient can flow back. You can use cat/stack to build a bigger Tensor based on smaller ones in a differentiable way.

AlphaBetaGamma96 · April 5, 2021, 1:25pm

Looks like I’m learning some PyTorch afterall! Thanks @albanD!