Steady increase in CPU RAM memory

moth · March 8, 2023, 5:04pm

Bug report

Summary of problem: I’ve been encountering a steady increase in CPU RAM memory while using a PyTorch DataLoader. Only loading the data leads to an increase in CPU RAM which eventually crashes the notebook. Please note that I am actually using GPU and not CPU, hence my device is cuda. This notebook’s torch version is:

torch @ file:///tmp/torch/torch-1.13.0-cp37-cp37m-linux_x86_64.whl.

A very similar notebook which had not problem used torch:

torch @ file:///tmp/torch/torch-1.11.0-cp37-cp37m-linux_x86_64.whl

Steps to reproduce:
Simple Dataset:

from albumentations import Normalize
from albumentations.pytorch import ToTensorV2


class CustomDataset(Dataset):

    def __init__(self, dataframe, train, valid):

        self.dataframe = dataframe
        self.train = train
        self.valid = valid
        
        if self.train:
            self.transform = Compose([
                                      Normalize(p=1),
                                      ToTensorV2(p=1)])
        if self.valid:
            self.transform = Compose([
                                      Normalize(p=1),
                                      ToTensorV2(p=1)])

    def __len__(self):
        return len(self.dataframe)

    def __getitem__(self, index):
        idx = self.dataframe['image_id'][index] # Select image id
        image_path = self.dataframe['path'][index] # Get image path
        image = cv2.imread(image_path) # Read image
        image = self.transform(image=image) # Apply transforms
        image = image['image'] # Extract image from dictionary
        return (idx, image)

Loop:

for fold, (train_index, valid_index) in enumerate(kf.split(train_df)):
    train_data = train_df.iloc[train_index].reset_index(drop=True)
    train = CustomDataset(train_data, train=True, valid=False)
    train_loader = DataLoader(train,
                              batch_size=config.BATCH_SIZE_TRAIN,
                              shuffle=True,
                              num_workers=config.NUM_WORKERS,
                              drop_last=True)
    epochs = config.EPOCHS
    for epoch in range(epochs):
        for step, (idx, image) in enumerate(train_loader):
            image = torch.tensor(image, device=device, dtype=torch.float32)

After a couple hundred steps the notebook crashes and RAM is 100% used. It does not even complete one epoch.

ptrblck · March 9, 2023, 12:53am

I cannot reproduce the increase in host RAM using this code snippet:

from albumentations import Normalize, Compose
from albumentations.pytorch import ToTensorV2
import cv2


class CustomDataset(Dataset):
    def __init__(self):
        self.transform = Compose([Normalize(p=1), ToTensorV2(p=1)])

    def __len__(self):
        return 100

    def __getitem__(self, index):
        image_path = "./image.jpeg"
        image = cv2.imread(image_path) # Read image
        image = self.transform(image=image) # Apply transforms
        image = image['image'] # Extract image from dictionary
        return (index, image)
    
dataset = CustomDataset()
loader = DataLoader(dataset, batch_size=10, num_workers=2, shuffle=True)

for epoch in range(1000):
    for step, (idx, image) in enumerate(loader):
        image = torch.tensor(image, device="cuda", dtype=torch.float32)

so I’m unsure if the issue is caused by the dataframe or another undefined object.
Could you check if loading a single image as seen in my code also increases the RAM usage in your setup?

moth · March 9, 2023, 2:24pm

Here is a minimal example on Kaggle to reproduce. I confirm that your snippet does not seem to cause this problem. However, I am unable to detect whether the root cause is the dataframe object or loading some particular images (can images be damaged (?)). Finally, I’ve noticed there is a discrepancy between the monitored RAM usage through the psutil library (via psutil.virtual_memory().percent) and the Kaggle monitor. On the first one, RAM seems constanst while in the second it keeps increasing.