Errors when using num_workers>0 in DataLoader

simoloca · September 26, 2020, 11:05am

Hi all, I’m facing a problem when setting the num_workers value in the DataLoader bigger than 0.
In particular I’m trying to train a custom model on a custom dataset.
Thus, if I hold the num_workers=0, everything it’s fine and the whole process is successful.
But for every other configuration on num_workers, the problem persist for every setting I try with the batch_size used, number of epochs for traning, ecc.
And it seems that when num_workers is greater than 0, x for istance, the script try to run x times and then the error came out.

The strange issue comes when I try to run the script with the bottleneck utility with the num_workers settings bigger then 0, and in this way it works correctly.
So my question is if the bottleneck utility applies some type of optimization that I don’t do, or when we want to set the num_workers bigger than 0 we need to so something in particular.

P.S.
I want to use the num_worker>0 in order to push my GPU to the max usage.

Thanks.