Training a deep learning algorithm restarts pc

Hi, I don’t know whether it is relavent here or not but i am pasting it here may be someone can guide me. I just got an old PC which has nvidia geforce gtx 1080 ti GPU. I installed latest drivers, CUDA and cuDNN. Then from pytorch site supported version of pytorch. All went well and torch.cuda.is_available() returns true. The issue i am facing is that when I start the training script for u2net, after first iteration the pc suddenly restarts. No apperent errors or warning. Please help!!
PS. I was using windows 10 and the issue happened couple of times and os crashed.

2 Likes

Hi.
Did you check your task manager? any abnormal activity in programs?
Did you check hardware temperature?
Are computer fans working correctly?
I suggest you run your script on services like google colab and see what happen.

hi, Thanks for the reply.
yes, I have noticed those things. No abnormal activities, the hardware was cool and the fans were working alright.
The script runs fine on colab. but since it is a limited resource that is why I got this PC.

I guess your PSU might be too weak for a sudden power spike. We’ve had similar discussions here in the past and usually restarts boil down to the PSU. If you have a spare one, replace the current PSU and check if the issue is still visible.

Hi, Let me check that.
PS. I am fan by the way.

hi, I checked it and my pc has PSU of 750W while 250W is recommended for 1080ti. The issue remains. whenever i starts training the pc reboots.

another thing i checked is that simpler cnn models with less epochs are running fine. only u2net model does that

You can check this topic where users were seeing similar issues with PSU which should be powerful enough, but in the end seemed to suffer from large peaks in power drain and shut down.

thank you so much for your valuable insight and time. I guess I need to get a new PSU.

Before buying a new one, I would check with your friends if someone has either a spare PSU or could borrow you theirs for some quick tests. (I also think someone else with a similar issue was able to test a PSU from a shop for some tests and bought if afterwards.)

Run tests on memory, buy new RAM memory