How to work with large training set when dealing with auto-encoders on google colaboratory?

OuisYasser · December 11, 2020, 2:29pm

hello, I dont know I f I can post random questions rather than pytorch ones.

I am training an auto-encoder (keras) on google colab. however, I have 25000 input image and 25000 output image. I tried to: 1- copy the large file from google drive to colab each time (takes 5-6 hours). 2- convert the set to numpy array but when normalizing the images, the size get a lot bigger (from 7GB to 24GB for example) and then I can not fit it into the ram memory. 3- I can not zip and unzip my data. So please, if anyone knows how to convert it into numpy array( and normalize it) without having large file(24GB).

ptrblck · December 14, 2020, 8:22am

The question seems to be rather Keras/TF-specific and I think you would get the best support on e.g. StackOverflow. I’m unfortunately not deeply familiar with Keras and don’t know, how the data loading (lazy loading) can be implemented.