Data Loading Memory efficiency

danielhavir · November 18, 2017, 2:41pm

Hi,

I have a dataset of over 12mil. images of size 180x180 (RGB), approx. 85GB of size. I’m training an image classifier on an AWS p2.8xlarge instance (8GPU, 32CPU, 488GB RAM).

I have a custom Dataset object, which is basically an ImageFolder with the exception that I’m trying to load images in advance. I.e. I only change lines 41-42 here: https://github.com/pytorch/vision/blob/master/torchvision/datasets/folder.py like this:

path = os.path.join(root, fname)
img = pil_loader(path)
item = (img, class_to_idx[target])
images.append(item)

and then remove the loading (line 122) from the __getitem__ method after.

Yet when I run the classifier, it’s somewhat inefficient. After loading roughly 10% of the dataset, it uses over 100GB of RAM. Any idea on what the problem may be or what should I focus on?

I’m grateful for any ideas. Thank you in advance.

thnkim · November 18, 2017, 6:50pm

It looks,
10% of your dataset == 1.2M 180x180 RGB images == 116,640MB is needed even if your image is in uint8 format.
Does the 85GB mean the total size of the files or in memory?

Thank you.

danielhavir · November 18, 2017, 7:37pm

Yes, 85GB is the total size of files in storage (in .jpg). The training set is circa 9 800 000 images, float32. I did the math, now I see