Ctx fields of functions non being deleted during backward

rotabulo · October 4, 2017, 10:09am

I noticed that fields of ctx that we populate in the forward method of an autograd.Function in order to use them in the backward pass are not automatically released by the framework. As a workaround I am currently manually deleting those fields in the backward method. Is this a wanted behavior ?

albanD · October 4, 2017, 10:48am

The fields that you populate in ctx are deleted when the computational graph itself is deleted.
You can see it in the following code snippet:

import torch
from torch.autograd import Variable, Function

import weakref

my_ref = None

class MyFunc(Function):
    @staticmethod
    def forward(ctx, inp):
        a = torch.rand(2)
        global my_ref
        my_ref = weakref.ref(a)
        ctx.a = a
        print("Ref in forward: ", my_ref())
        return inp.clone()

    @staticmethod
    def backward(ctx, gO):
        print("Ref in backward: ", my_ref())
        return gO.clone()


def get_loss():
    inp = Variable(torch.rand(10), requires_grad=True)
    out = MyFunc.apply(inp)
    return out.sum()

loss = get_loss()
print("Ref before backward: ", my_ref()) # returns the Tensor
loss.backward()
print("Ref after backward: ", my_ref()) # returns the Tensor
del loss
print("Ref after del loss: ", my_ref()) # returns None

rotabulo · October 30, 2017, 11:11am

@albanD do I cause any harm if I delete the fields at the end of backward? E.g., I noticed that gradcheck will not work anymore if I do so, since apparently calls backward more than once per forward call. Are you relying in the framework on having the fields in ctx (that I populate) available also after the first call to backward?

albanD · October 30, 2017, 1:02pm

Anything in the ctx field is considered as any computation buffer. So this buffer will be freed when the graph is not needed anymore. In the context of gradcheck, we run multiple backwards with retain_graph=True such that the buffers are kept and we can call backward multiple times on the same graph.
I don’t understand why do you want to delete them yourself?

rotabulo · October 30, 2017, 5:29pm

Might be helpful in case I am very close to the gpu memory limit and ctx holds a large buffer. In that case, I might need to free a large buffer in ctx before creating the backward buffer to prevent out-of-memory issues.

albanD · October 30, 2017, 6:46pm

If you were doing that, your backward function won’t be able to be called twice (even with the retain_grad=True flag) which is not correct.