Does creating a new tensor break Autograd?

deepfailure · April 27, 2021, 10:14am

Hello,

I am training a neural network that outputs a vector: output → with size (Batch size x N).

Then, I want to reduce the vector to (Batch size x M) (len(M)<len(N)) by creating groups of size 10 of N and summing them.

My question is if I do:

new_output = torch.tensor([tensor.sum() for tensor in torch.split(output, 10, dim=1)])

I then have to compare new_output with another variable of size (Batch_size x M) which also required grad.

Will creating a new tensor break Autograd?

Also, how one can be sure that Autograd works correctly? For simple architectures is straightforward, however, I am not sure how to do it if I create more complex ones.

Thanks!

AlphaBetaGamma96 · April 27, 2021, 12:19pm

Hi,

I think using torch.tensor would break it because torch.tensor has no grad_fn. What you could do is replace,

with,

new_output = torch.stack([tensor.sum() for tensor in torch.split(output, 10, dim=1)])

That should keep the gradients flowing as torch.stack has a grad_fn whereas torch.tensor doesn’t!

deepfailure · April 27, 2021, 3:19pm

Thanks! So in order to check if autograd works I should be sure that the operations have grad_fn?

AlphaBetaGamma96 · April 27, 2021, 3:43pm

In terms of exact functionality of Autograd it’d best to get a Dev’s opinion (as they know what’s exactly going on), but what I would say from purely a mathematical perspective is that the gradient of a given operation is defined by the chain rule.

So it’s essentially a product of all intermediate gradients from a given layer (e.g. your torch.tensor function) up to the final layer (where you compute your loss value). So, if one of those intermediate gradients between these 2 layers is 0 (from not having a grad_fn) then all preceding layers (i.e. all layers before your torch.tensor layer) will have a zero-valued gradient by definition.