More than 6x RAM consumed when loading images before training

Summary: Attempting to load 42GB images all in RAM before training. While loading it occupies more than 256GB RAM.

I have training, test and dev data totaling about 42 GB.

The folders are divided based on this format.

/data
	/train
		/video_name
			-frame0001.png
			.
			.
			-frame0132.png
	/test
		/video_name
	/dev
		/video_name

The video_name and other metadata associated with it is stored in 3 csv files (train, test, dev).
I have a cluster with 256GB physical RAM. I am trying to load all the frames to RAM before starting the training. This is because I am attempting to run a large number of epochs and I have (thought so) enough RAM so why not :slight_smile:

Here’s the skeleton ImageDataSet that I use to load all images from a given video_name folder:

from skimage import io
import glob
class ImageDataset(torch.utils.data.Dataset):
    
    def __init__(
            self,
            root_dir: str,
            file_type: str,
            transform=None):
        """

        :param root_dir: Directory with images
        :param file_type: png
        :param transform: Optional transform to be applied on a sample (frame)
        """
        super(ImageDataset, self).__init__()
        self.root_dir = root_dir
        self.file_type = file_type
        # identify all frames with provided file extension type
        self.frames = sorted(glob.glob(self.root_dir + '/*' + self.file_type))

        self.transform = transform

    def __len__(self):
        return len(self.frames)

    def __getitem__(self, idx):
        img_name = self.frames[idx]
        # read frame
        image = io.imread(img_name)
        # Scale to 0-1
        image = image / 255.0
        sample = {'image': image}
        if self.transform:
            sample = self.transform(sample)

        return sample

I call this class N times, with N being number of video folders in test or dev or test folders.

My understanding is this should take ~42GB space in RAM. However, as the data is loaded, it takes more than 256GB and the python3 process was killed by Linux. I monitored the RAM usage using top command, and I could see the memory usage slowly increasing and eventually exceeding 256GB. This is before any training occurs.

Any idea to debug this is appreciated.

My understanding is this should take ~42GB space in RAM.

Images (or videos) that are saved on disk are very heavily compressed!
But when you turn them into Tensors, you have 3 floating point values per pixel per frame.

To see this, you can try to save your image in the TIFF format (which is uncompressed) and you will see how much bigger it is!

Hmm. Ok, I calculated approximate size when the images are loaded as tensors.

Each image is 227 pixel x 227 pixel x 3 channels

227 * 227 * 3 * 32 bit float / (8 * 1024 * 1024) = 0.589 MB

A ~45kB png image becomes ~0.6MB when loaded to memory, i.e., ~13.41 times more :cold_sweat: Ok, understood now.

Here’s some things I can try to reduce the memory footprint:

  • Decrease pixel resolution when loading as tensor. Right now stuck with 227xx227 :frowning:
  • Use 1 channel instead of 3 (reduce by factor of 3)
  • Load as a 16 bit float (reduce by factor of 2)

Are there any other approaches I can try?

The one other thing you can consider is not loading the whole thing into memory at once and use the provided Dataset/DataLoader constructs to parallelize the loading from disk to make sure the decoding/preprocessing is not slowing down your training.

Thanks for the reply. Using DataLoader is not feasible right now because the images are combined with torchtext objects before training.
I do however have a custom function written to just load the videos in a batch. Say if the batch size is 1, then just that one folder is loaded. Over long term this is slowing down training because the same video ultimately ends up being loaded Z number of times, where Z is the number of epochs. In addition all the validation set videos are downloaded in each validation step.
I am looking into frame subsampling to try to store all data in RAM. That would be the fastest way to solve this issue.