Dataloading putting into a queue for multiprocessing?


(Antoine Liutkus) #1

Hi everyone,

In my setting, I am using multiprocessing a lot and it turns out I would like the dataloader to lie in other processes than the ones actually processing the data

for doing this, I’v been using dataloaders, but passing them around to new processes seems to lead to deadlocks and some sync bugs, that are difficult to debug (I’m on pytorch 0.4.1)

My question is:
it seems that having the data come through a Queue object would solve the problem. Is there some built in functionality making data traveling through Queues ? In other word, it would be an alternative to dataloaders

if no, I can easily implement that, but I’m wondering if this is a feature in pytorch already ?

best


(Sreenivas V Rao) #2

I’m not quite sure, but the DataLoader does multi-threading on its own when num_workers>1. Does that not suffice to queue the data for your process?


(Antoine Liutkus) #3

It’s not about multi-threading, but multiprocessing,

cheers


(ChengLu She) #4

If you are using multiprocessing on your own, I suggest you disable the multiprocessing of the data loader by set the batch size to 0.