Searching nested GPU tensors in a custom object

Hello Everyone,
I am trying to build a system, which should optimize GPU memory utilization during training process. In order to deal with custom user models, I need an estimator of GPU memory for a custom object. A simple example of such an object would be:

import torch
from argparse import Namespace

x = torch.zeros(1024).cuda()
y = torch.ones([2, 40]).cuda()

tmp_namespace = Namespace()
tmp_namespace.find_me = x

object_to_check = {'a': [1, 2, {'y': y}], 'b': (6, tmp_namespace, (y, y))}

I want to write a function, which will tell me, what is the GPU memory usage for this object.

Some observations I already made:

  • torch.cuda.memory_allocated() might help here. I tried this way:
def object_gpu_memory(obj):

    start_memory = torch.cuda.memory_allocated()
    with torch.no_grad():
        tmp = copy.deepcopy(obj)

    end_memory = torch.cuda.memory_allocated()
    del tmp

    return end_memory - start_memory

but, sadly, it gives me

RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

for my use case. I want to support intermediate tensors as well.

  • Python garbage collector may help to get all Tensor objects, but I did not figure it out how to apply it here.
  • Recursive traversion of a typical object with dir() method falls into infinite recursion.

Is there a way to create such function? If so, could you give some hints how to create it?
Thanks in advance.