Dataloading putting into a queue for multiprocessing?

Hi everyone,

In my setting, I am using multiprocessing a lot and it turns out I would like the dataloader to lie in other processes than the ones actually processing the data

for doing this, I’v been using dataloaders, but passing them around to new processes seems to lead to deadlocks and some sync bugs, that are difficult to debug (I’m on pytorch 0.4.1)

My question is:
it seems that having the data come through a Queue object would solve the problem. Is there some built in functionality making data traveling through Queues ? In other word, it would be an alternative to dataloaders

if no, I can easily implement that, but I’m wondering if this is a feature in pytorch already ?

best

1 Like

I’m not quite sure, but the DataLoader does multi-threading on its own when num_workers>1. Does that not suffice to queue the data for your process?

It’s not about multi-threading, but multiprocessing,

cheers

If you are using multiprocessing on your own, I suggest you disable the multiprocessing of the data loader by set the batch size to 0.