Best practice to cache the entire dataset during first epoch

Not sure what you referring to, but init usually is expected to be fast and shouldn’t load too much data, as it can create bugs in case of multiprocessing. see How to share data among DataLoader processes to save memory