I’m training a segmentation model using pytorch but system randomly restarts while training. It is not an overheating issue as i control fan speed manually and GPU temperature is always around 70-78C. What’s also strange is that machine always restarts when i train using pytorch. This issue has never occurred whenever i train using fastai.
I have so far made all these changes but the problem still persists -
- Limiting GPU power consumption. I have Nvidia RTX A4000 (from 140W to 110W)
- Turning of intel turbo boost from BIOS as discussed here Reliably repeating pytorch system crash/reboot when using imagenet examples · Issue #3022 · pytorch/pytorch · GitHub .
Also I’m using WSL2 on Windows 11.