When to call DataLoaders for DistributedDataParallel

I have found this thread with comment where @ptrblck suggests to load the data first in CPU memory, and then at batch time, change the batch device ID to to GPU. How can I go about doing this within the DDP framework?

Thanks.

Kristian