I have ~40GB compressed images for training, separated by folders. Each folder has about 150 images, and there are ~10000 folders in total. I load N number of folders at a time where N is the batch size. Loading the whole dataset in memory is not an option for me.
I have 5 simultaneous training running with various hyper parameters. They are running in isolated docker containers, but they are using the same data folder. I have noticed the training is getting slower as I add new training runs. It is possible more than 1 training run is trying to access the same data folder. When does this start becoming a problem? I need to run many more for experimentation.