Automatically reboot when set cudnn.benchmark to True

When I train the resnet-18 model in pytorch imagenent example
https://github.com/pytorch/examples/blob/master/imagenet/main.py
there are two lines
import torch.backends.cudnn as cudnn
cudnn.benchmark = True
As I run the program, the server with 4 gtx 1080 gpus automatically reboot after several seconds. And after I set the value to False, everything is fine.
So what’s the mean of those codes and what happens when the value is set to True?

this can happen if you have hardware power issues. cudnn benchmark mode pushes GPUs to their limits and they might be tripping power.

Thanks for your reply.
The server have 1 x intel core i7-6900k cpu,4 x NVIDIA GTX 1080 and a power supply unit of 1600W rated power.
Is the power supply sufficient? And how much speedup will I get if I set cudnn.benchmark=True?

Hi, zed

Have you figure this issue out that how much speedup will you get when you set cudnn.bencnmark=True?
I confront the same problem. \

Thank you!

Hi, Eric

I didn’t test on how much speedup I could get, since I eventually fixed my problem.

I found that wrong label sequences were generated by my code. At first, I set the label as a number from 1 to number of classes, but it seems that pytorch only deals with labels start from 0 to num_classes-1. So everything works fine now.

Pleased if my reply helps you.

Thank you very much!!