How to free up all memory pytorch is taken from gpu memory

Pytorch is taking more memory then move on to the next loop, so eventually fails out of cuda memory:

I have some kind of high level code, so model training and etc. are wrapped by pipeline_network class. My main goal is to train new model every new fold.

for train_idx, valid_idx in cv.split(meta_train[DEPTH_COLUMN].values.reshape(-1)):

        meta_train_split, meta_valid_split = meta_train.iloc[train_idx], meta_train.iloc[valid_idx]
        
        pipeline_network = unet(config=CONFIG, suffix = 'fold' + str(fold), train_mode=True)

But then I move on to 2nd fold everything fails out of gpu memory:

RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

At the end of epoch I tried to manually delete that pipeline with no luck:

 def clean_object_from_memory(obj): #definition
    del obj
    gc.collect()
    torch.cuda.empty_cache()

clean_object_from_memory( clean_object_from_memory) # calling

Calling this didn’t help as well:

def dump_tensors(gpu_only=True):
        torch.cuda.empty_cache()
        total_size = 0
        for obj in gc.get_objects():
            try:
                if torch.is_tensor(obj):
                    if not gpu_only or obj.is_cuda:
                        del obj
                        gc.collect()
                elif hasattr(obj, "data") and torch.is_tensor(obj.data):
                    if not gpu_only or obj.is_cuda:
                        del obj
                        gc.collect()
            except Exception as e:
                pass

How can reset pytorch then I move on to the next fold?

2 Likes

Hi, did you find a solution?