Sometimes Stucks Training Oddly

JisongXie · December 4, 2020, 6:11am

My training sometimes stucks oddly. It stops printing.

And the GPU utils is 0%.

When I press Crtl + C to terminate it, it prints log like this.

It seems that the error occurs when loading data by multiprocessing.
Anyone knows why?

The verison of some modules in my code:
python version：3.7.9
torch version：1.4.0
cuda version：10.0
I use the docker interactively, and use screen to run in background.