Dataloader seems to freeze

Hello. Sometime just after loading a network my program seems to hang. I run my stuff overnight so I realize the day later, and even after 5 or 6 hours the program is still waiting for something. However, when I run CTRL+C finally I get this error trace

CException ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f
3bb3d67dd0>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py",
line 961, in __del__
    self._shutdown_workers()
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py",
line 941, in _shutdown_workers
    w.join()
  File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 140, in join
    res = self._popen.wait(timeout)
  File "/opt/conda/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wai
t
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/opt/conda/lib/python3.7/multiprocessing/popen_fork.py", line 28, in pol
l
    pid, sts = os.waitpid(self.pid, flag)

and, weirdly enough, the computation continues from here normally.
Does anyone have any idea?
I am currently running a lot of small simulations, so it’s pretty frustrating where in the morning I realized that I spent 6 hours running nothing. This doesn’t happen all the time, but aroudn once every 5 runs, and it’s pretty unpredictable.

Thanks

anyone has any idea about this problem?

Not sure, but downgrading pytorch <= 1.3.0 can help the problem

The reason was not due to Pytorch but to Neptune, when logging images. It seemed to interfere with multiprocessing. I reported the problem which got solved soon after that.

Could you mind sharing your solutions? I probably confront the same issue