Hi everyone, I’d like to share a tutorial I wrote recently about using Nvidia DALI to speed up the Pytorch dataloader. It contains a few tips I found for getting the most out of DALI, which allow for a completely CPU pipeline & ~50% larger max batch sizes than the reference examples.
DALI gives really impressive results, on small models its ~4X faster than the Pytorch dataloader, whilst the completely CPU pipeline is ~2X faster. This means nearly 4000 images/s on a Tesla V100 & single GPU ImageNet training in only a few hours!
Article is here and codebase is here. Hope you find it useful!