Realistically how many simultaneous training runs can be ran to train with the same data

I have ~40GB compressed images for training, separated by folders. Each folder has about 150 images, and there are ~10000 folders in total. I load N number of folders at a time where N is the batch size. Loading the whole dataset in memory is not an option for me.

I have 5 simultaneous training running with various hyper parameters. They are running in isolated docker containers, but they are using the same data folder. I have noticed the training is getting slower as I add new training runs. It is possible more than 1 training run is trying to access the same data folder. When does this start becoming a problem? I need to run many more for experimentation.

I don’t think the problem is that multiple processes are accessing the same data folder, but that multiple processes are generally accessing the storage at the same time thus sharing the bandwidth of e.g your SSD.

Ok, makes sense. Thanks.