data = torch.tensor([[1.,2.,3.,4.], [5.,6.,7.,8.]], requires_grad=True)
batch = data.shape
t_data = data.reshape(batch, 2, 2)
tf_data = torch.zeros((batch, 3, 2, 2))
for i in range(batch):
tf_data[i] = t_data[i].expand_as(tf_data[i])
loss = torch.sum(tf_data)
I want to know the gradient flow for some tensor operation like reshape or expand_as, so I wrote above code to test. I got below output.
the grad of data and tf_data is all 3 and 1 no matter how I changed data, can anyone explain the gradient flow for this case in pytorch? How does the pytorch calculate the gradient for this case?
tf_data grads are very logical since
loss = sum (tf_data) .No matter what your code looks like before, and since
loss = tf_data[0,0,0,0] + tf_data[0,0,0,1] + ... + tf_data[1,2,1,1] the grad will be just a differentiation with respect to each element which means 1.
data grads, this line
tf_data[i] = t_data[i].expand_as(tf_data[i]) is basically repeating (by stacking)
t_data[i] ( shape = ( 2, 2) in this case) a number of times until it reaches
tf_data[i] dimensions ( (3, 2, 2) in this case). means you are repeating 3 times.
loss = sum(tf_data) = 3 * sum (t_data) = 3 * sum (data) means whatever your code looks like before ( without changing dimensions) you will get
grads = 3.
3 * sum (t_data) = 3 * sum (data) because summing two tensors with same elements and different shapes, will result in the same result.