V100 is too slow for training

I have some issues with v100 GPU. A tiny-yolov2 classification example takes around 25 seconds for one mini-batch(batch size 50). It consumes around 2.5 GB of ram and GPU utilization hovering around 0.
I’m using windows server 12 with the v100 and all of the installation is done via anaconda environment.
Python = 3.7
pytorch = 1.0

Same code runs in 2-2.2 second on one p2000 GPU and on a 1080 1.8 seconds. Both p2000 and 1080 are on a ubuntu 16.04 machine with anaconda, pytorch 1.0, python 3.7.

one weird issue is that in windows machine nvidia-smi does not show the Python Processes even if the GPU ram is fully utilized by increasing the batch size.

this is somewhat similar with the

want to open a new post to get some attention.
thanks for your help

The issue is related with the BIOS options. It is solved right now.

Could you detail the BIOS issue you have faced? I am also having a similar issue of very slow training times on V100 and have reinstalled the entire software stack from the OS to CudNN but to no effect. I would like to know what I can check in the BIOS to see if everything is properly set.

In my case, there is a bios option about performance mode of the system. I selected as high performance mode. I think, it arranges pcie bus. I dont remember the exact name.