Can you use PyTorch DataLoader? If you implement the __getitem__
function, the batches will be lazily read into memory. Each DDP replica will then have one DataLoader, and each DataLoader will load the data lazily, so there shouldn’t be as much memory pressure.
Relevant Forums Post: How to use dataset larger than memory?