PyTorch’s data loader uses multiprocessing in Python and each process gets a replica of the dataset. When the dataset is huge, this data replication leads to memory issues.
Normally, multiple processes should use shared memory to share data (unlike threads). I wonder if there is an easy way to share the common data across all the data loading worker processes in PyTorch. Maybe someone has already coded this (I could not find yet).