Correct way of aggregating scalar tensors into a single tensor without losing gradients?

Hey all! Let’s suppose I have a non-jagged array of scalar tensors each with gradient information (e.g. the losses of a couple of different models). Is there an intended way of aggregating these together into a single tensor without losing gradient information?

Here’s a minimal example to demonstrate:

import torch
x1, x2, x3, x4 = torch.tensor([1., 2., 3., 4.])
w = torch.nn.Parameter(torch.tensor(5.), requires_grad = True)

l1, l2, l3, l4 = (x * w for x in [x1, x2, x3, x4])
l = [
    [l1, l2],
    [l3, l4]
assert torch.tensor(l).grad_fn is None

Currently to get around this I’m doing a double torch.stack, but I wonder if there’s a cleaner solution? Could get especially nasty if you’re working with an arbitrarily shaped l.


Using torch.stack or to create a single tensor in a differentiable way is the right approach.

1 Like