Tensors seem to be held by execution frames

My PyTorch code has a GPU memory leak.

To debug this issue, I used this function:

import gc
import torch
from pynight.common_torch import torch_memory_tensor

def find_tensors_on_gpu():
    for obj in gc.get_objects():
            if torch.is_tensor(obj):
                if obj.is_cuda:
                    obj_size = torch_memory_tensor(obj, s=2) #: MB
                    if obj_size >= 10:
                        print(f'Tensor ID: {id(obj)} Type: {type(obj)} Size (MB): {obj_size}')
                        for ref in gc.get_referrers(obj):
                                if isinstance(ref, dict):
                                    for k, v in ref.items():
                                        if v is obj:
                                            print(f'Variable Name: {k}')
                                    print(f"ref: {ref}")
                            except Exception as e:
        except Exception as e:


The output of this function is:

Tensor ID: 140152393953856 Type: <class 'torch.Tensor'> Size (MB): 23.0859375
Variable Name: features_out
ref: <frame at 0x7f77c8dae640, file '/home/vit/code/pytorch-image-models/timm/models/decomposition.py', line 1873, code forward>
Tensor ID: 140152393947616 Type: <class 'torch.Tensor'> Size (MB): 1142.75390625
Variable Name: attributions_v
ref: <frame at 0x7f77c8dae640, file '/home/vit/code/pytorch-image-models/timm/models/decomposition.py', line 1873, code forward>
Variable Name: attributions_v
Tensor ID: 140159443545328 Type: <class 'torch.Tensor'> Size (MB): 1142.75390625
Variable Name: residual_attributions_v
ref: <frame at 0x55c5761c0270, file '/home/vit/code/pytorch-image-models/timm/models/vision_transformer.py', line 251, code forward>

There are two problems:

  1. My GPU memory is at 7299MiB, while these tensors only sum up to about 3GB.
  2. These tensors seem to be held by an execution frame of Python?! How do I free them?


  1. PyTorch uses a caching allocator as described here.
  2. The tensors seem to be referenced in the forward method so might be needed for the gradient computation in a backward call.

Is there a way to free up all the computational graph of a model’s parameters? Sth like model.zero_grad() but that frees up the stored computational graph.

PS: The Jupyter issue I linked was indeed my primary problem, and the workarounds there worked. In short, Jupyter stores references to local-scope tensors when an exception happens, and this seriously messes up the gc.

You could either call backward() on the output/loss which would free the intermediates or del the output of the model to delete all references to the intermediates and allow PyTorch to reuse the memory.