RAM/CPU memory leak with transforms

Hello,
I have been trying to debug an issue where, when working with a dataset, my RAM is filling up quickly. It turns out this is caused by the transformations I am doing to the images, using transforms.
My code is very simple:

  for dir1 in os.listdir(img_folder):
      for file in os.listdir(os.path.join(img_folder, dir1)):
          image_path = os.path.join(img_folder, dir1,  file)
          with Image.open(image_path) as img_pil:
            normalize = transforms.Normalize(mean=mean,std=std)
            preprocess = transforms.Compose([
              transforms.Resize((img_size,img_size)),
              transforms.ToTensor(),
              normalize
            ])
            img_pil = preprocess(img_pil)

Without running the “preprocess code”, the memory is emptied correctly upon opening and closing images.

I have tried defining the normalize and preprocess function outside the loop, but memory was still accumulating.

Am I missing something? Is there a way to free up the memory that is being occupied by the transformation steps?

NB: same issue arises when using a dataloader. But I didn’t know what was causing it, that’s how I ended up here.

Thanks

You can use batch transforms outside of loading loop (it is probably much faster too).

I wonder if redefining img_pil inside with loop is causing this issue for the PIL library.

If I want to do batch transform, I’ll have to open all images in memory, which would kinda lead to the same result. I am working on a very limited amount of RAM, and I want to open each image at a time, transform it, do some predictions, close it, and move to another.

It seems the tensor operation is what causing this issue.
I dug a bit deeper into the transform function. The issue is caused by the following line:
tensor.sub_(mean).div_(std).

I tried to imitate it manually doing the following:

Outside the loop:

MEAN = 255 * torch.tensor([0.485, 0.456, 0.406])
STD = 255 * torch.tensor([0.229, 0.224, 0.225])
meanOP = MEAN[:, None, None]
stdOP = STD[:, None, None]

In the loop:
img_pil = (img_pil - meanOP / stdOP)

The issue is reproduced with the above.
So it seems it is related to tensor operations.

UPDATE:

It seems that the issue is worse than i thought. It could be related to any tensor operation. Simple operation such as changing the type of the tensor to float32 is causing this memory problem as well.

For some reason, the memory is not being cleaned.
PS: I tried forcing garbage collection. It was not useful.

          with Image.open(image_path) as img_pil:
            img_pil = torch.from_numpy(np.array(img_pil))
            img_pil = img_pil.type(torch.float32)

Here’s a script I tried at local machine:

tf = transforms.Compose(
    [transforms.Resize((1000, 1000)),
     transforms.PILToTensor()])

for fname in list:
    with Image.open(fname) as img_pil:
        img_pil = tf(img_pil)
        img_pil = img_pil.type(torch.float32)
    print(psutil.virtual_memory().available)

and here’s output (swap is disabled):

54919290880
54912282624
54912274432
54900854784
54903816192
54903750656
54904684544
54904807424
54904815616
54904860672
54904860672
54904729600
54904963072
54904954880
54904762368
54904885248
54904717312
54904791040
54904856576
54904930304
54904983552
54904717312
54905122816
54904283136
54904274944
54904217600
54904332288

as you can see memory released just fine

Could you post an executable code snippet to reproduce the increasing memory usage?
I’ve seen similar results to @my3bikaht’s post and couldn’t reproduce it.

I checked your suggestions and turns out I have the same result. After isolating the problem, It seems that the issue is caused by the profiler to measure the performance of the model over all the test set, as shown in the code below:

import torch
import torchvision.transforms as transforms
import os
from PIL import Image
import psutil
from torch.profiler import profile, record_function, ProfilerActivity

mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)

img_folder = "/path/to/img"
img_size = 224

def test():
  preprocess = transforms.Compose([
      transforms.Resize((img_size,img_size)),
      transforms.ToTensor(),
      transforms.Normalize(mean=mean, std=std)
  ])

  for dir1 in os.listdir(img_folder):
    for file in os.listdir(os.path.join(img_folder, dir1)):
      image_path = os.path.join(img_folder, dir1,  file)
      with Image.open(image_path) as img_pil:
        img_pil = preprocess(img_pil)
      memory = psutil.virtual_memory()
      totmemory = memory.total >> 20
      usedmemory = memory.used >> 20
      print(usedmemory)

with profile(activities=[ProfilerActivity.CPU], profile_memory=True, record_shapes=True) as prof:
  test()
print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=10))

Is there a better-performing way to profile the model without building up the RAM?
I have created a colab here: