Loading a Tensor from file in batches

Hi everyone,

I have a tensor stored in a file (initially it was a dataset stored in a matlab .mat file that I read using scipy and then stored as a tensor using torch.save). the data represents RGB images stored in a tensor of shape (N * C * H * W) where N = number of training examples/images, C = number of channels, H * W = size of the images.

I load the tensor using torch.load, but the problem is that I want to load these images in batches, so that I don’t have to load the whole file to memory for training (actually I don’t even have enough memory to load it all).

I read the Pytorch tutorial about loading custom datasets but in the tutorial they load the whole tensor file containing image names/paths (stored as .jpg files) then load the images one at a time (I can’t do this because my images are all stored in a single tensor).

Thanks a lot !

TL;DR: how to load a tensor from disk to memory in batches? (the tensor file is saved with torch.save)

Based on this older post it seems that you could use a Storage to load the data in chunks.
However, I don’t see an offset argument, so I guess the proper way would be to use np.memmap and load chunks of a numpy array (assuming you could store it via numpy).

I think the memmap idea is a good solution for my problem, I’ll try to create a Dataset object that uses it under the hood.

Thanks !

additionally to @ptrblck 's answer, I also used another method for my 2nd dataset: I made a Python script that reads my entire dataset and then saves each training example in a single file, then in my DataLoader I implement the loading of one training example. So, now, the DataLoader will take care of loading just “batch_size” number of examples to memory, instead of the whole dataset.