I’m training a simple network on a single GPU overnight, however I notice that the training would stop after an hour or so and would continue when I move the mouse or keyboard (attached screenshot, the training pauses at epoch 76 and resumes after I moved the mouse).
So last night I decided to run a separate timing script (attached below) whilst training the network to check whether this is a PyTorch problem or something else, the training ran without any problem.
# timing script
from datetime import datetime
import time
while(1):
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
print("Current Time =", current_time)
time.sleep(300)