Correct way of retaining intermediate tensors with gradients?

conv8d · December 30, 2022, 6:15pm

What’s the correct way of doing the following loop?

# assume gradient is enabled for all tensors
b1_list, b2_list = [], []
for i in range(n):
    a1, a2 = some_function()
    b1, b2 = some_neural_net(a1, a2)
    b1_list.append(b1) # or b1_list.append(b1.clone()) ? or something else?
    b2_list.append(b2) # or b2_list.append(b2.clone()) ? or something else?
b1_tensor = torch.stack(b1_list)
b2_tensor = torch.stack(b2_list)

eqy · December 30, 2022, 10:11pm

It doesn’t look like b1 or b2 are reused or here, so simply holding on to the references without the clone seems fine, unless there is another part of the code that is being omitted. The autograd graph should be kept around until backward() is called without calling clone.

conv8d · January 1, 2023, 2:46pm

But b1, b2 get updated in the next loop iteration. Is this an issue for gradients?
Also b1_tensor and b2_tensor are used later to compute losses.

eqy · January 2, 2023, 8:18am

No, assignment of references in Python should affect the gradients e.g.,

>>> import torch
>>> a = torch.randn(10, requires_grad=True)
>>> l = list()
>>> a.prod().backward()
>>> a.grad
tensor([ 0.0006, -0.0028,  0.0090,  0.0011, -0.0022,  0.0013, -0.0103,  0.0014,
        -0.0030, -0.0010])
>>> a.data_ptr()
34399296
>>> l.append(a)
>>> a = torch.randn(10, requires_grad=True)
>>> a.sum().backward()
>>> a.grad
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
>>> a.data_ptr()
76175680
>>> l[0].data_ptr()
34399296
>>> l[0].grad
tensor([ 0.0006, -0.0028,  0.0090,  0.0011, -0.0022,  0.0013, -0.0103,  0.0014,
        -0.0030, -0.0010])
>>>