Runtime Error when increase num_workers on windows platform

XavierXiao · April 10, 2019, 6:28pm

Hi, I think this problem has been discussed but I still cannot solve the issue. I am running on Windows platform, and I try to use a dataloader with num_works to be greater than 0. But when I enumerate the dataloader for training, it returns things like

Traceback (most recent call last):

  File "<ipython-input-16-6453c2fe763d>", line 1, in <module>
    for batch_idx, data in enumerate(data_loader):

  File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 819, in __iter__
    return _DataLoaderIter(self)

  File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 560, in __init__
    w.start()

  File "D:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)

  File "D:\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)

  File "D:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)

  File "D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)

  File "D:\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)

  File "D:\Anaconda3\lib\site-packages\torch\multiprocessing\reductions.py", line 286, in reduce_storage
    metadata = storage._share_filename_()

RuntimeError: Couldn't open shared file mapping: <torch_968_1926347372>, error code: <0>

Change to num_workers=0 solves the issue, but the training will be definitely slower.

Things I have done is wrap all my codes under if __name__ == '__main__':,but that does not help. I am using a device with Nvidia 1060 GPU and i7-7700 CPU. What I I want to know is that, is this a hardware issue that cannot be solved? Is there anything that possibly can solve the issue? Thanks!

peterjc123 · April 14, 2019, 7:31am

What about using Python instead of IPython? Does it occur when you when some simple examples like mnist? https://raw.githubusercontent.com/pytorch/examples/master/mnist/main.py