Is it possible to preserve the structure of computational graph without tensor data?


I am struggling with implementing reversible residual network.

When I implement this, the computational graph is consisted of some nodes, and this nodes are constructed using Tensor.

The strength of reversible network is that it can construct earlier activation with current activation. However, the computational graph use Tensor as nodes, which makes them always store the earlier activation.

I want to use gpu memory as efficient as I could. So, I want to retain the structure of graph while the tensor data is freed. Can I do this? If I can, how to do this?

One possible solution is to delete However, I worried that it makes unexpected result.

Thanks in advance.

I found out that deleting is impossible.

My test code:

a = torch.zeros([32, 64], requires_grad=True).cuda()
b = torch.zeros([64, 128], requires_grad=True).cuda()
c = torch.zeros([128, 256], requires_grad=True).cuda()
print("Calculate result four times: ", torch.cuda.memory_allocated())
d = torch.matmul(a, b)
e = torch.matmul(d, c)
print("Calculate result four times: ", torch.cuda.memory_allocated())
print("Calculate result four times: ", torch.cuda.memory_allocated())
print("Calculate result four times: ", torch.cuda.memory_allocated())


.data is not a thing anymore. It has been removed from the doc and will be deleted.

The graph is actually composed of Node (the function that needs to run in the backward) and not Tensor. Only the Tensors required to compute the backward are saved by the Nodes that needs them.

Can I ask how to deal with this situation??

  • I construct a computational graph with tensor.
  • I don’t want to hold data, i.e. activation, in the intermediate nodes.
  • When I call backward, I want to calculate the gradient of intermediate nodes with my custom backward.

Clearly, the first and third requirements can be accomplished by using normal tensor operation and the custom autograd.Function, respectively.
The main concern is how to remove only data, not the computation node itself.

Is there any suggestion about implementing this?? I think I might to hack ctx in the autograd.Function, but there is lack of documentation… :frowning:

Some code template I want is to this.

class CustomFunction(autograd.Function):
  def forward(ctx, x):
    output = f(input)
    # TODO: remove input data or deallocate gpu memory for input data. but how?
    return output

  def backward(ctx, grad_output):
    return torch.ones_like(grad_output)


If you don’t want any Tensor to be saved, just don’t save anything in the ctx and no Tensor will be saved for that Function.

Every Tensor is deleted as soon as it is not referenced by anything. So the input data will be deleted as soon as you don’t use it in your forward function.

1 Like

Thanks for the answer. :slight_smile:

I tested it by my custom code.

class CustomMM(autograd.Function):
  def forward(ctx, x, y):
    out =, y)
    return out

  def backward(ctx, grad_out):
    return magic_grad_for_x(grad_out), magic_grad_for_y(grad_out)

x = torch.rand(10000, 10, requires_grad=True).cuda()
y = torch.rand(10, 10000).cuda()
z = torch.rand(10000, 100).cuda()

output = CustomMM.apply(CustomMM.apply(x, y), z)

# output.backward()  # It doesn't work because there is no magic function :)
1 Like