I’m curious if you’ve taken a look at DALI?
I’m spending some time going through the code now to try to get a sense of what’s going on. It’s computer vision specific, but in another thread about speeding up the dataloader there were claims of an 8x speedup, although how much of that is coming from doing the image preprocessing on GPU I’m not sure.
I’m working with tabular data which it doesn’t support, but I’m trying to figure out some of their caching strategies and techniques so I can work it into the dataloader I’m working on to go along with RAPIDS.AI. (I work for NVidia on the Rapids team focused on deep learning for tabular data)