Hi, I want to run my model with different configurations. I don’t want to wait for training with one configuration to be done and then start another one. I want to run all of them at the same time. I don’t have a problem with computation power. But, I don’t want to create multiple versions of the same dataset, and I want to use only one copy of the dataset for all trainings. Therefore, I don’t know if pytorch dataloader in one training waits if the dataset is locked by another training?
Typically the dataloader will only use read-only access, so it should work to have multiple processes read the same dataset. That said, if the bandwidth of the underlying storage becomes the bottleneck, you would necessarily observe a slowdown. Also, caching and the storage characteristics could impact the performance of how the dataset is read (but then, the most standard dataloader setups read random samples from the dataset, too).