Dataloader freezes when num_workers>0 on windows without GPU

janders111 · August 20, 2019, 8:06pm

It freezes hangs right at the beginning. Seems to be something to do with the multiprocessing/queues.py. I have already read some other posts and tried:

Not using GPU
“from torch.utils.data.dataloader import DataLoader” and “from torch.utils.dataimport DataLoader”
setting pin_memory=False
adding

if __name__ == "__main__"

running in Administrator mode
reinstalling pytorch
python3.7 on Windows 10, latest stable pyTorch build 1.2

from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader

class DriveData(Dataset):

    def __init__(self):
        self.data = [1, 2, 3, 4, 5, 6]

    # Override to give PyTorch access to any image on the dataset
    def __getitem__(self, index):
        return self.data[index]

    # Override to give PyTorch size of dataset
    def __len__(self):
        return len(self.data)


def main():
    dset_train = DriveData()
    train_loader = DataLoader(dset_train, batch_size=2, shuffle=True, num_workers=1)

    for i, data in enumerate(train_loader):
        print(i)
        print(data)

if __name__ == "__main__":
    main()

Output when num_workers is 0:

0
tensor([2, 6])
1
tensor([1, 4])
2
tensor([5, 3])

No output when num_workers is >0. Just hangs.

peterjc123 · August 24, 2019, 7:11am

I tried the 1.2.0 + cuda 10.0 + python 3.6 package, which can’t reproduce this issue.

ilyes · August 27, 2019, 10:33am

Did you copy paste exactly his code ? because I tried it myself and I had the same issue!

ilyes · August 27, 2019, 10:34am

Have you fixed the problem ?

peterjc123 · August 27, 2019, 12:43pm

Yes, I didn’t change anything.

peterjc123 · August 27, 2019, 12:57pm

FYI, I’m using Python 3.6 and CUDA 10.0.

ilyes · August 27, 2019, 1:06pm

yeah! probably python 3.7 is doing the problem

ilyes · August 27, 2019, 4:20pm

I used Python 3.6.9 and CUDA 10.0 and pytorch 1.2.0 and it doesnt work !

peterjc123 · August 28, 2019, 1:38am

Would you please send a bug report on https://github.com/pytorch/pytorch/issues? BTW, what is the traceback if you press ctrl+c?

ilyes · August 28, 2019, 10:28am

I reported the issue.
by traceback you mean the error text, I didnt get you ? I am using jupyter notebook btw

peterjc123 · August 28, 2019, 10:44am

Yes, I mean the error text if you kill that process at background. BTW, is it reproducible if you run it through command prompt?

ilyes · August 28, 2019, 12:11pm

same error when run from the command prompt. Here’s the error message:

BrokenPipeError                           Traceback (most recent call last)
<ipython-input-10-344640e27da1> in <module>
----> 1 final_model, hist = train_model(model, dataloaders_dict, criterion, optimizer)

<ipython-input-9-fdf91f815fa7> in train_model(model, dataloaders, criterion, optimizer, num_epochs)
     23             # Iterate over data.
     24             end = time.time()
---> 25             for i, (inputs, labels) in enumerate(dataloaders[phase]):
     26                 inputs = inputs.to(device, non_blocking=True)
     27                 labels = labels.to(device , non_blocking=True)

~\Anaconda3\envs\py_gpu\lib\site-packages\torch\utils\data\dataloader.py in __iter__(self)
    276             return _SingleProcessDataLoaderIter(self)
    277         else:
--> 278             return _MultiProcessingDataLoaderIter(self)
    279 
    280     @property

~\Anaconda3\envs\py_gpu\lib\site-packages\torch\utils\data\dataloader.py in __init__(self, loader)
    680             #     before it starts, and __del__ tries to join but will get:
    681             #     AssertionError: can only join a started process.
--> 682             w.start()
    683             self.index_queues.append(index_queue)
    684             self.workers.append(w)

~\Anaconda3\envs\py_gpu\lib\multiprocessing\process.py in start(self)
    110                'daemonic processes are not allowed to have children'
    111         _cleanup()
--> 112         self._popen = self._Popen(self)
    113         self._sentinel = self._popen.sentinel
    114         # Avoid a refcycle if the target function holds an indirect

~\Anaconda3\envs\py_gpu\lib\multiprocessing\context.py in _Popen(process_obj)
    221     @staticmethod
    222     def _Popen(process_obj):
--> 223         return _default_context.get_context().Process._Popen(process_obj)
    224 
    225 class DefaultContext(BaseContext):

~\Anaconda3\envs\py_gpu\lib\multiprocessing\context.py in _Popen(process_obj)
    320         def _Popen(process_obj):
    321             from .popen_spawn_win32 import Popen
--> 322             return Popen(process_obj)
    323 
    324     class SpawnContext(BaseContext):

~\Anaconda3\envs\py_gpu\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
     87             try:
     88                 reduction.dump(prep_data, to_child)
---> 89                 reduction.dump(process_obj, to_child)
     90             finally:
     91                 set_spawning_popen(None)

~\Anaconda3\envs\py_gpu\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

BrokenPipeError: [Errno 32] Broken pipe

ilyes · August 28, 2019, 1:23pm

this issue is weird! My code runs on Colab smoothly, so I created an envirnment locally with EXACTLY the same versions of python 3.6.8, pytorch 1.1.0, torchvision 0.3.0, and cudatoolkit 10.0.130. Still having the same bug!

peterjc123 · August 29, 2019, 1:42am

What about using python instead of ipython?