Hello,
I am training a neural network that outputs a vector: output → with size (Batch size x N).
Then, I want to reduce the vector to (Batch size x M) (len(M)<len(N)) by creating groups of size 10 of N and summing them.
My question is if I do:
new_output = torch.tensor([tensor.sum() for tensor in torch.split(output, 10, dim=1)])
I then have to compare new_output with another variable of size (Batch_size x M) which also required grad.
Will creating a new tensor break Autograd?
Also, how one can be sure that Autograd works correctly? For simple architectures is straightforward, however, I am not sure how to do it if I create more complex ones.
Thanks!
Hi,
I think using torch.tensor
would break it because torch.tensor
has no grad_fn
. What you could do is replace,
with,
new_output = torch.stack([tensor.sum() for tensor in torch.split(output, 10, dim=1)])
That should keep the gradients flowing as torch.stack
has a grad_fn
whereas torch.tensor
doesn’t!
1 Like
Thanks! So in order to check if autograd works I should be sure that the operations have grad_fn?
In terms of exact functionality of Autograd it’d best to get a Dev’s opinion (as they know what’s exactly going on), but what I would say from purely a mathematical perspective is that the gradient of a given operation is defined by the chain rule.
So it’s essentially a product of all intermediate gradients from a given layer (e.g. your torch.tensor
function) up to the final layer (where you compute your loss value). So, if one of those intermediate gradients between these 2 layers is 0 (from not having a grad_fn
) then all preceding layers (i.e. all layers before your torch.tensor
layer) will have a zero-valued gradient by definition.
1 Like