Can I allocate intermediate variable before optimization loop?

If I have several large intermediate variable in the optimization loop, can I allocate them outside the loop to avoid to generate them every iteration? I encountered RuntimeError when trying to do so and I found an explanation. Is there a correct way to allocate intermediate variable?

var1 = torch.rand(128, 128, requires_grad=True)
m1 = var1.new_empty((4, *var1.shape)) # intermediate variable
a = torch.linspace(1, 10, 4).reshape(4, 1, 1)

for i in range(10):
    #m1 = var1.new_empty((4, *var1.shape)) # No error if this line is uncommented
    m1[:] = torch.exp(var1[None, :] ** 2)
    l = loss(m1, data)
    l.backward()
    with torch.no_grad():
        var1 -= var1.grad * 0.1
        var1.grad.zero_()

#RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results #have already been freed. Specify retain_graph=True when calling .backward() or autograd.grad() the #first time.

Instead of recreating m1 (this could also work, since PyTorch would reuse the allocated memory) you could also detach m1 from the previous iteration using m1.detach_().

Got it. Thanks a lot.

Could you please show me how to detach m1? I used the following code but an error happened.

var1 = torch.ones(128, 128, requires_grad=True)
m1 = var1.new_empty((4, *var1.shape), requires_grad=True)
a = torch.linspace(1, 10, 4).reshape(4, 1, 1)

for i in range(10):
    print(i)
    m1.detach()[:] = torch.exp(var1[None, :] * 2)
    l = loss(m1, data)
    l.backward()
    with torch.no_grad():
        var1 -= var1.grad * 0.1
        var1.grad.zero_()
#---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-7971284f49be> in <module>
      9     l.backward()
     10     with torch.no_grad():
---> 11         var1 -= var1.grad * 0.1
     12         var1.grad.zero_()

TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'

Here is an example:

var1 = torch.rand(128, 128, requires_grad=True)
m1 = var1.new_empty((4, *var1.shape)) # intermediate variable
a = torch.linspace(1, 10, 4).reshape(4, 1, 1)
data = torch.randn(128, 128)

for i in range(10):
    m1.detach_()
    m1[:] = torch.exp(var1[None, :] ** 2)
    l = (m1 - data).mean()
    l.backward()
    with torch.no_grad():
        var1 -= var1.grad * 0.1
        var1.grad.zero_()
1 Like