Hello. Sometime just after loading a network my program seems to hang. I run my stuff overnight so I realize the day later, and even after 5 or 6 hours the program is still waiting for something. However, when I run CTRL+C finally I get this error trace
CException ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f
3bb3d67dd0>
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py",
line 961, in __del__
self._shutdown_workers()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py",
line 941, in _shutdown_workers
w.join()
File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 140, in join
res = self._popen.wait(timeout)
File "/opt/conda/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wai
t
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/opt/conda/lib/python3.7/multiprocessing/popen_fork.py", line 28, in pol
l
pid, sts = os.waitpid(self.pid, flag)
and, weirdly enough, the computation continues from here normally.
Does anyone have any idea?
I am currently running a lot of small simulations, so it’s pretty frustrating where in the morning I realized that I spent 6 hours running nothing. This doesn’t happen all the time, but aroudn once every 5 runs, and it’s pretty unpredictable.
Thanks