Not releasing GPU memory - how to debug?

I have a flask (python) server, serving pytorch models as API.
With each request allocated GPU memory grows and eventually I get “Out of Memory”.
As far as I can tell, there shouldn’t be anything in my code holding any references to models/tensors, and everything should be initialized from scratch for each request, but I’m sure I’m wrong, just don’t know how to debug: what is holding reference to these tensors?

I filtered out the tensors on GPU:

    import gc
    objs = gc.get_objects()
    gpu_tensors = [obj for obj in objs if isinstance(obj, torch.Tensor) and obj.is_cuda]

I can get objects referring to these tensors using gc.get_referrers(t) – most of them are referenced by three: “<class ‘list’>”, “<class ‘list’>”, “<class ‘collections.OrderedDict’>” (and few by 72 – probably suspicious?)
But I don’t know how to figure out what is this “list” in my code? Why variable is holding it so I can delete/release it?

I also tried to visualize using objgraph, and get nice graphs, like this one for one of those tensors:


But the same problem – where in my code is this “list” or “list_iterator”?

I would be grateful for any tips on how to learn to debug this issue.

try running torch.cuda.empty_cache() at the beginning.

This cleares all cached memory in gpu

Thank you. I already did that, but it didn’t change anything.

@motown_dad If possible can you share your flask code?

Clearing the cache won’t help as described in other posts, since only the reusable memory will be returned to the system. It will however slow down your code as the cudaFree calls are synchronizing.

Here is the code: ComfyUI workflow for debugging · GitHub

It’s generated from ComfyUI workflow, using (ComfyUI-to-Python-Extension)[GitHub - pydn/ComfyUI-to-Python-Extension: A powerful tool that translates ComfyUI workflows into executable Python code.]

I don’t expect anyone to debug layers of frameworks of my code (in this case bunch of ComfyUI classes and then pytorch underneath). I’m hoping for advice how I can debug this myself.

That makes sense. Any advice how to debug this? How can I figure out what are these referrers that objgraph or gc is showing?