Pytorch program using gpu seems freeze after running 1 hour

torch = 0.4.0
After running for about 1 hour, the process seems freeze with no output. What are the possible reasons?

+-------------------------------+----------------------+----------------------+
|   6  TITAN Xp            On   | 00000000:0E:00.0 Off |                  N/A |
| 23%   30C    P8    15W / 250W |  12011MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Are you able to reproduce this issue? or it happened only once?
The issue might be specific to your machine because of several reasons such as maximum RAM/CPU usage.
Nevertheless, If you are able to reproduce the issue, please share a reproducible code snippet with us.

It is a bit hard to reproduce.
Currently, I found

  1. it related to the logging in python or
  2. cuda9.2 or
  3. pytorch multiple gpus, nccl related problem

I will figure out more.