I am loading an HDF5 file in a Dataset (I am making sure that everything is picklable, so that is not a problem) and using DataLoader with multiprocessing to read multiple chunks at a time.
When I leave the fork context as default there is no performance improvement in passing from 0 workers to 10, i.e. it takes more time to load a 32-item batch with multiprocessing than without. When I use the spawn (or forkserver) context, it takes a while to spawn the workers but then the data loading is significantly faster.
I am wondering if this tradeoff is expected or if there is something wrong with fork multiprocessing.
Thanks a lot!