Pytorch does not use GPU efficiently after reboot?

duygusar · March 21, 2019, 2:26pm

As much as it sounds crazy, I have just rebooted my computer and now my program using pytorch does not really utilize the GPU. Before the reboot, it was multiple times faster and it did utilize the GPU so much so that I had to set my fans to something 80% and now I see that it only uses the GPU for a brief moment while doing validation but not during training (literally at zero to 1% (browser) during training). I did check the gpu and it is still seen and used by the code. All I did was a simple reboot really because some GUI was stuck. I have not run updates or anything. This is a curious case really. I can trace the usage using the nvidia GUI and the nvidia-smi dmon / nvidia-smi. I have NO idea why this happens.

and I did reboot again but it is still acting the same. And this is in my code (added second line just to make double sure)

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
cuda.set_device(0)

Why me

Oli · March 21, 2019, 2:52pm

Is the GPU memory in use during training? How much? nvidia-smi has a Memory-Usage field.

How much GPU-util does it use during validation and for how long?

You can use the command watch -n 0.1 nvidia-smi to run the nvidia-smi every 0.1 second

Reboot again ;)?

duygusar · March 21, 2019, 2:57pm

Memory is being used, yes, at %80.

I just came to update, the more epochs I run, the faster it gets. At this rate, I won’t be having a problem soon. I take GPU is like the engine of a car on a cold morning then?

Oli · March 21, 2019, 2:59pm

I don’t know tbh. It takes a while for me to get it running, but that’s something between 5-15 seconds and that’s including other overhead in my program

Glad it’s working better at least

duygusar · March 21, 2019, 3:01pm

Yes, thank you, I didn’t have this problem with Caffe/1080. I am now using Titan X. It is much faster now as it used to be.