Best practices for Dataloader

Hey, is there a best practices for creating efficient dataloader dealing with images?

Multiple workers usually help. If you set num_workers to a very high number, the performance might drop again, so you should play around a bit and find a sweet spot for your system.
Pinned memory and .to('cuda', non_blocking=True) might also hide the latency of the host to device copy.

Did you encounter any issues/bottlenecks?