Hey all! Let’s suppose I have a non-jagged array of scalar tensors each with gradient information (e.g. the losses of a couple of different models). Is there an intended way of aggregating these together into a single tensor without losing gradient information?

Here’s a minimal example to demonstrate:

```
import torch
x1, x2, x3, x4 = torch.tensor([1., 2., 3., 4.])
w = torch.nn.Parameter(torch.tensor(5.), requires_grad = True)
l1, l2, l3, l4 = (x * w for x in [x1, x2, x3, x4])
l = [
[l1, l2],
[l3, l4]
]
assert torch.tensor(l).grad_fn is None
```

Currently to get around this I’m doing a double `torch.stack`

, but I wonder if there’s a cleaner solution? Could get especially nasty if you’re working with an arbitrarily shaped `l`

.

Thanks!