Queue in DataLoader does not behave as expected when using num_workers?

So the problem isn’t Pytorch, it is with how multiprocessing works in python which Pytorch uses. In python multiprocessing, all the data objects in the parent process are serialized to be passed to the child process. If the size of the parent objects is not large, the overhead is not a problem and you see the advantage of using multiple processes in total time. However, if the data objects in the parent process are large (for my case it was ~100-200 MB), the serialization process adds overhead and thereby neutralizing any benefits one might get with multiprocessing.

So, I ended up with a somewhat complicated solution of creating the data loader in C++ and feeding that to Pytorch’s dataloader. That worked pretty well with respect to latency.

1 Like