Hello,

I am training a neural network that outputs a vector: **output** → with size (Batch size x N).

Then, I want to reduce the vector to (Batch size x M) (len(M)<len(N)) by creating groups of size 10 of N and summing them.

My question is if I do:

new_output = torch.tensor([tensor.sum() for tensor in torch.split(output, 10, dim=1)])

I then have to compare new_output with another variable of size (Batch_size x M) which also required grad.

Will creating a new tensor break Autograd?

Also, how one can be sure that Autograd works correctly? For simple architectures is straightforward, however, I am not sure how to do it if I create more complex ones.

Thanks!

Hi,

I think using `torch.tensor`

would break it because `torch.tensor`

has no `grad_fn`

. What you could do is replace,

with,

```
new_output = torch.stack([tensor.sum() for tensor in torch.split(output, 10, dim=1)])
```

That should keep the gradients flowing as `torch.stack`

has a `grad_fn`

whereas `torch.tensor`

doesn’t!

1 Like

Thanks! So in order to check if autograd works I should be sure that the operations have grad_fn?

In terms of exact functionality of Autograd it’d best to get a Dev’s opinion (as they know what’s exactly going on), but what I would say from purely a *mathematical* perspective is that the gradient of a given operation is defined by the chain rule.

So it’s essentially a product of all intermediate gradients from a given layer (e.g. your `torch.tensor`

function) up to the final layer (where you compute your loss value). So, if one of those intermediate gradients between these 2 layers is 0 (from not having a `grad_fn`

) then all preceding layers (i.e. all layers *before* your `torch.tensor`

layer) will have a zero-valued gradient by definition.

1 Like