How to load COCO dataset to dataloader to not overload memory

Tomash · November 18, 2020, 10:33pm

Hi, I have a problem with some memory leak (?).
When I load my dataset the usage of memory increase to 100 processes with 30 GB of RES memory. I think that the problem is my Dataset function. So I have a question how to load dataloader part by part (is it possible to reduce in this way memory necessary in my computations)? I attach my function below.

class LoadDataset(Dataset):

def __init__(self):
    self.images = []
    self.targets = []
    img_path, ann_path = (
        "path",
        "ann",
    )
    coco_ds = torchvision.datasets.CocoDetection(img_path, ann_path)
    for i in range(0, len(coco_ds)):
        img, ann = coco_ds[i]
        images, targets = collate(
                        [img.copy(), img.copy()], [ann, ann], coco_ds.coco
                    )
                    for t in targets:
                        self.targets.append(t)
                    for image in images:
                        self.images.append(image)

def __len__(self):
    return len(self.images)

def __getitem__(self, idx):
    img = self.images[idx]
    target = self.targets[idx]
    return (
        img,
        target,
    )

Later I load data before iteration over epochs like:

train_loader = DataLoader(LoadDataset(), batch_size=24, shuffle=True)