Correct way of retaining intermediate tensors with gradients?

What’s the correct way of doing the following loop?

# assume gradient is enabled for all tensors
b1_list, b2_list = [], []
for i in range(n):
    a1, a2 = some_function()
    b1, b2 = some_neural_net(a1, a2)
    b1_list.append(b1) # or b1_list.append(b1.clone()) ? or something else?
    b2_list.append(b2) # or b2_list.append(b2.clone()) ? or something else?
b1_tensor = torch.stack(b1_list)
b2_tensor = torch.stack(b2_list)

It doesn’t look like b1 or b2 are reused or here, so simply holding on to the references without the clone seems fine, unless there is another part of the code that is being omitted. The autograd graph should be kept around until backward() is called without calling clone.

1 Like

But b1, b2 get updated in the next loop iteration. Is this an issue for gradients?
Also b1_tensor and b2_tensor are used later to compute losses.

No, assignment of references in Python should affect the gradients e.g.,

>>> import torch
>>> a = torch.randn(10, requires_grad=True)
>>> l = list()
>>> a.prod().backward()
>>> a.grad
tensor([ 0.0006, -0.0028,  0.0090,  0.0011, -0.0022,  0.0013, -0.0103,  0.0014,
        -0.0030, -0.0010])
>>> a.data_ptr()
34399296
>>> l.append(a)
>>> a = torch.randn(10, requires_grad=True)
>>> a.sum().backward()
>>> a.grad
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
>>> a.data_ptr()
76175680
>>> l[0].data_ptr()
34399296
>>> l[0].grad
tensor([ 0.0006, -0.0028,  0.0090,  0.0011, -0.0022,  0.0013, -0.0103,  0.0014,
        -0.0030, -0.0010])
>>>
1 Like