This was the first time I trained a network on a local linux machine with 4 GPUs. I was following the tutorial on Data Parallelism to make my code utilizes the multiple gpus on my machine, and I set the device to be “cuda:0”. However, when I ran the code, it completely freezes my Linux system so I have to do a reboot. Yet it always rebooted to my black screen, and I think maybe Pytorch has something to do with it.
Has anyone encountered similar issues?