nn.DataParallel freeze my linux machine?

Zhiqiu_Lin · June 2, 2018, 4:09pm

Hey,

This was the first time I trained a network on a local linux machine with 4 GPUs. I was following the tutorial on Data Parallelism to make my code utilizes the multiple gpus on my machine, and I set the device to be “cuda:0”. However, when I ran the code, it completely freezes my Linux system so I have to do a reboot. Yet it always rebooted to my black screen, and I think maybe Pytorch has something to do with it.

Has anyone encountered similar issues?

Thanks!