Slow data loading when training a classification model on ImageNet

I’m trying to train a classification model on ImageNet but it seems that the training is much slower than expected and the reason is that the data loading is very slow. I’m training on a 4-gpu machine with 48 cpu threads.

I tried to increase the num_workers and prefetch_factor but no significant help. CPUs seem fine – only two threads with 100% usage during data loading. All the data are stored in the home directory as well.

I am very confused about this problem as I thought if the data loading is the bottleneck, there must be something wrong with the CPU (e.g., not enough CPU to process the data of the current batch). But now it seems that the CPU are mostly idle. What could possibly be the problem?

This post might give you some information about data loading bottlenecks.

Thanks for the reference. This post definitely helps to fill in some background regarding the data processing.

However, it’s still a bit hard for me to identify the problem on my machine. What is the limitation? Disk throughput, IOPS, CPU, or memory?

I see that PyTorch has a benchmark to test performance on operations like bmm or neural network inference. I guess it would be great to have some benchmark suite to figure out what is blocking the data loading. It would also be useful to see how the current system compared to others. We could anticipate performance gaps from the GPU benchmarks but currently there is no ways to how my other non-GPU infrastructure is compared to other people’s systems.