In Pycharm:
torch.cuda.is_available: True
torch.version.cuda: 12.6
torch.cuda.device_count: 1
In the terminal, I press nvidia-smi, too, everything is fine, driver 560 , cuda 12.16
It would seem that everything is fine, I start the training cycle and at 8-10 epochs (after 15 minutes) everything collapses, all systems show that cuda does not exist at all:
return torch._C._cuda_getDeviceCount() > 0
torch.cuda.is_available: False
torch.version.cuda: 12.6
torch.cuda.device_count: 0
Process finished with exit code 0
In the terminal:
PS C:\Users\User> nvidia-smi
Unable to determine the device handle for GPU0000:01:00.0: GPU is lost. Reboot
the system to recover this GPU
Installing:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
After restarting the computer, cuda appears everywhere again, but again it works for several cycles and that’s it. I have already reinstalled the drivers several times and the Cuda Toolkit v 12.6 and the PyTorch library in different ways, the result is always the same: the computer dies after a few cycles and comes to life for a while after restarting the PC. Are you completely desperate, asking for help?