Dataloader seems to freeze

Hello. Sometime just after loading a network my program seems to hang. I run my stuff overnight so I realize the day later, and even after 5 or 6 hours the program is still waiting for something. However, when I run CTRL+C finally I get this error trace

CException ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/",
line 961, in __del__
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/",
line 941, in _shutdown_workers
  File "/opt/conda/lib/python3.7/multiprocessing/", line 140, in join
    res = self._popen.wait(timeout)
  File "/opt/conda/lib/python3.7/multiprocessing/", line 48, in wai
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/opt/conda/lib/python3.7/multiprocessing/", line 28, in pol
    pid, sts = os.waitpid(, flag)

and, weirdly enough, the computation continues from here normally.
Does anyone have any idea?
I am currently running a lot of small simulations, so it’s pretty frustrating where in the morning I realized that I spent 6 hours running nothing. This doesn’t happen all the time, but aroudn once every 5 runs, and it’s pretty unpredictable.


anyone has any idea about this problem?

Not sure, but downgrading pytorch <= 1.3.0 can help the problem

The reason was not due to Pytorch but to Neptune, when logging images. It seemed to interfere with multiprocessing. I reported the problem which got solved soon after that.

Could you mind sharing your solutions? I probably confront the same issue