Multiple workers for single batch

Hi everyone!
I am working on a project with histopathological images (called Whole-Slide Image). Each of these images is ~1 GB, so they are really hard to handle.
Particularly I struggle when I use DataLoader(num_worker=N) (where N>1) because PyTorch starts loading and preprocessing (slight data augmentation in our case) multiple batches in RAM and then the RAM fills up fast.
I wanted to know if there are other people working on implementing an alternative DataLoader mechanism that could allow us to have multiple workers working on the same batch.
I would also like to know if you have any suggestions regarding this topic.
Since I never opened a Pytorch PR and since I noticed that worker shutdown/handling is a very delicate matter, do you think I could open a draft and then someone could provide some suggestions and support?

This feature request is tracked here and Iā€™m sure contributions are welcome, so please feel free to post your interest in the issue and code owners will follow up with you. :slight_smile:

1 Like