Hello. Sometime just after loading a network my program seems to hang. I run my stuff overnight so I realize the day later, and even after 5 or 6 hours the program is still waiting for something. However, when I run CTRL+C finally I get this error trace
CException ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f 3bb3d67dd0> Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 961, in __del__ self._shutdown_workers() File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 941, in _shutdown_workers w.join() File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 140, in join res = self._popen.wait(timeout) File "/opt/conda/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wai t return self.poll(os.WNOHANG if timeout == 0.0 else 0) File "/opt/conda/lib/python3.7/multiprocessing/popen_fork.py", line 28, in pol l pid, sts = os.waitpid(self.pid, flag)
and, weirdly enough, the computation continues from here normally.
Does anyone have any idea?
I am currently running a lot of small simulations, so it’s pretty frustrating where in the morning I realized that I spent 6 hours running nothing. This doesn’t happen all the time, but aroudn once every 5 runs, and it’s pretty unpredictable.