Loading Data from Numpy Array in batches using Pytorch

I have video data stored in a Numpy array each of shape (1, 16, 100, 100, 3). The array is huge and every time I load it, I get a memory error.
Is there a way using Pytorch to load these videos in batches without actually loading the Numpy array so that I don’t get the memory error.


What kind of error are you getting?
An array with [16, 100, 100, 3] float32 values would use approx. 1.83MB, which is quite small.

1 Like

Thanks for replying. The array has 93000 samples. Therefore the shape of the array is (93000, 1, 16, 100, 100, 3). I want to load this dataset in batches so that I don’t get a memory error. Please let me know if there is a way out. I will be grateful for your help.

Assuming this numpy array is stored locally as an npy file, you could use np.load with the mmap_mode, which would allow you to load sliced from the disc without reading the whole array into memory:

mmap_mode {None, ‘r+’, ‘r’, ‘w+’, ‘c’}, optional
If not None, then memory-map the file, using the given mode (see numpy.memmap for a detailed description of the modes). A memory-mapped array is kept on disk. However, it can be accessed and sliced like any ndarray. Memory mapping is especially useful for accessing small fragments of large files without reading the entire file into memory.

1 Like

Thank you very much. It worked.