# Is it possible to preserve the structure of computational graph without tensor data?

Hello,

I am struggling with implementing reversible residual network.

When I implement this, the computational graph is consisted of some nodes, and this nodes are constructed using Tensor.

The strength of reversible network is that it can construct earlier activation with current activation. However, the computational graph use Tensor as nodes, which makes them always store the earlier activation.

I want to use gpu memory as efficient as I could. So, I want to retain the structure of graph while the tensor data is freed. Can I do this? If I can, how to do this?

One possible solution is to delete tensor.data. However, I worried that it makes unexpected result.

Thanks in advance.

I found out that deleting tensor.data is impossible.

My test code:

``````a = torch.zeros([32, 64], requires_grad=True).cuda()
b = torch.zeros([64, 128], requires_grad=True).cuda()
c = torch.zeros([128, 256], requires_grad=True).cuda()
print("Calculate result four times: ", torch.cuda.memory_allocated())
d = torch.matmul(a, b)
e = torch.matmul(d, c)
print("Calculate result four times: ", torch.cuda.memory_allocated())
del d.data
print("Calculate result four times: ", torch.cuda.memory_allocated())
e.mean().backward()
print("Calculate result four times: ", torch.cuda.memory_allocated())
print(a.grad)
print(b.grad)
``````

Hi,

`.data` is not a thing anymore. It has been removed from the doc and will be deleted.

The graph is actually composed of Node (the function that needs to run in the backward) and not Tensor. Only the Tensors required to compute the backward are saved by the Nodes that needs them.

Can I ask how to deal with this situation??

• I construct a computational graph with tensor.
• I donâ€™t want to hold data, i.e. activation, in the intermediate nodes.
• When I call `backward`, I want to calculate the gradient of intermediate nodes with my custom `backward`.

Clearly, the first and third requirements can be accomplished by using normal tensor operation and the custom `autograd.Function`, respectively.
The main concern is how to remove only data, not the computation node itself.

Is there any suggestion about implementing this?? I think I might to hack `ctx` in the `autograd.Function`, but there is lack of documentationâ€¦

Some code template I want is to this.

``````class CustomFunction(autograd.Function):
def forward(ctx, x):
output = f(input)
# TODO: remove input data or deallocate gpu memory for input data. but how?
return output

def backward(ctx, grad_output):
return torch.ones_like(grad_output)
``````

Hi,

If you donâ€™t want any Tensor to be saved, just donâ€™t save anything in the `ctx` and no Tensor will be saved for that Function.

Every Tensor is deleted as soon as it is not referenced by anything. So the input data will be deleted as soon as you donâ€™t use it in your forward function.

1 Like

Thanks for the answer.

I tested it by my custom code.

``````class CustomMM(autograd.Function):
def forward(ctx, x, y):
out = torch.mm(x, y)
return out

def backward(ctx, grad_out):
return magic_grad_for_x(grad_out), magic_grad_for_y(grad_out)

print(torch.cuda.memory_allocated())
x = torch.rand(10000, 10, requires_grad=True).cuda()
y = torch.rand(10, 10000).cuda()
z = torch.rand(10000, 100).cuda()
print(torch.cuda.memory_allocated())

output = CustomMM.apply(CustomMM.apply(x, y), z)
print(torch.cuda.memory_allocated())

# output.backward()  # It doesn't work because there is no magic function :)
``````
1 Like