Dataloader: more num_workers do not reduce runtime?

Hi,

the best way depends on the data and what you are doing with it. This could mean that your CPU is saturated, that the bottleneck isn’t the CPU (but e.g. storage) or something else.

Some general ideas:

  • do on-the-fly processing (e.g. augmentation) on the GPU as much as you can, i.e. not in the dataloader,
  • storage can be a huge bottleneck, e.g. don’t load huge images only to scale them down, but instead do a preprocessing step in advance where you scale down to a “reasonable size” (i.e. doesn’t need to be the final size, but I’ve seen things being slow because XX Megapixel images were loaded only to then immediately rescale + crop to 227x227,
  • if you adjust your pipeline, the Thomas rule of thumb is it isn’t optimization unless you measure before and after. Things like pin_memory are often advocated, but people sometimes find it helps and sometimes now.

Best regards

Thomas