Fork() is significantly slower than spawn() in PyTorch DataLoader

I am loading an HDF5 file in a Dataset (I am making sure that everything is picklable, so that is not a problem) and using DataLoader with multiprocessing to read multiple chunks at a time.

When I leave the fork context as default there is no performance improvement in passing from 0 workers to 10, i.e. it takes more time to load a 32-item batch with multiprocessing than without. When I use the spawn (or forkserver) context, it takes a while to spawn the workers but then the data loading is significantly faster.

I am wondering if this tradeoff is expected or if there is something wrong with fork multiprocessing.

Thanks a lot!

Does using fork take up all your memory? Maybe its resorting to swap?

Doesn’t look like it, I still have tons of free RAM during execution.

It seems that the slowdown is related to the reading operation inside __getitem__. When I get rid of the sample = h5_file[a:b] line (just as example of what reading looks like) forking gives the expected speedup.