Ctx fields of functions non being deleted during backward

I noticed that fields of ctx that we populate in the forward method of an autograd.Function in order to use them in the backward pass are not automatically released by the framework. As a workaround I am currently manually deleting those fields in the backward method. Is this a wanted behavior ?

The fields that you populate in ctx are deleted when the computational graph itself is deleted.
You can see it in the following code snippet:

import torch
from torch.autograd import Variable, Function

import weakref

my_ref = None

class MyFunc(Function):
    def forward(ctx, inp):
        a = torch.rand(2)
        global my_ref
        my_ref = weakref.ref(a)
        ctx.a = a
        print("Ref in forward: ", my_ref())
        return inp.clone()

    def backward(ctx, gO):
        print("Ref in backward: ", my_ref())
        return gO.clone()

def get_loss():
    inp = Variable(torch.rand(10), requires_grad=True)
    out = MyFunc.apply(inp)
    return out.sum()

loss = get_loss()
print("Ref before backward: ", my_ref()) # returns the Tensor
print("Ref after backward: ", my_ref()) # returns the Tensor
del loss
print("Ref after del loss: ", my_ref()) # returns None

@albanD do I cause any harm if I delete the fields at the end of backward? E.g., I noticed that gradcheck will not work anymore if I do so, since apparently calls backward more than once per forward call. Are you relying in the framework on having the fields in ctx (that I populate) available also after the first call to backward?

Anything in the ctx field is considered as any computation buffer. So this buffer will be freed when the graph is not needed anymore. In the context of gradcheck, we run multiple backwards with retain_graph=True such that the buffers are kept and we can call backward multiple times on the same graph.
I don’t understand why do you want to delete them yourself?

Might be helpful in case I am very close to the gpu memory limit and ctx holds a large buffer. In that case, I might need to free a large buffer in ctx before creating the backward buffer to prevent out-of-memory issues.

If you were doing that, your backward function won’t be able to be called twice (even with the retain_grad=True flag) which is not correct.