I’m running some training jobs on a subset of 100 classes from ImageNet. I wrote a custom dataset class to load all data into RAM at the beginning of training to decrease training time, however the training time per epoch is similar to using an ImageFolder class (which as I understand it lazily loads images from disk).
Profiling with cProfile seems to have most time spent in a method “aquire” of “_thread.lock” objects. Since training data is already loaded in memory, why does it seem like the dataloader is still a bottleneck?
Is there anything I’m missing here?
Thanks in advance.