Runtime error with multiple workers: Couldn't open shared event


(David Berscheid) #1

HI pytorch community,

when I try to do data loading with multiple workers pytorch gives me a runtime error, with the following error message:

File “analysis.py”, line 287, in main
val_pk, threshold = validate(model, args, j, dev_dl, logger)
File “analysis.py”, line 140, in validate
for i, (data, target, paths) in enumerate(dataset):
File “C:\Users…\Python36\site-packages\torch\utils\data\dataloader.py”, line 330, in __next__idx, batch =self._get_batch()
File “C:\Users…\Python36\site-packages\torch\utils\data\dataloader.py”, line 309, in _get_batch return self.data_queue.get()
File “C:\Users…\Local\Continuum\anaconda3\envs\Mastearbeit2.0\lib\multiprocessing\queues.
py”, line 337, in get
return _ForkingPickler.loads(res)
File "C:\Users…\AppData\Roaming\Python\Python36\site-packages\torch\multiprocessing\reductions.py
", line 167, in rebuild_storage_filename
storage = cls._new_shared_filename(manager, handle, size)
RuntimeError: Couldn’t open shared event: <torch_9400_1872001324_event>, error code: <2>

Does anyone have experiences with this error?
Am I missing something in my code or is the problem rather on the hardware side?

Thanks in advance!


#2

Here is a similar issue.
Could you have a look at it and see, if the suggestions help you out?


#3

For future reference, this error can pop up if you have an exception in your dataset implementation. Instead of printing the exception stack trace, windows version will print this garbage instead.

Minimal example:

import torch
import torch.utils.data

class Dataset(torch.utils.data.Dataset):
    def __init__(self):
        super(Dataset, self).__init__()

    def __len__(self):
        return 92408

    def __getitem__(self, item):
        if item == 27003:
            raise Exception(':-)')
        return torch.rand((3, 16, 16))

def crash_test():
    d = Dataset()
    test_loader = torch.utils.data.DataLoader(d, 16, shuffle=False, num_workers=4, drop_last=False)
    batches = []
    for i, batch in enumerate(test_loader):
        # run batch through net..
        generated = batch
        batches.append(batch)
            
if __name__ == "__main__":
    crash_test()