Strange GPU usage when training the network

I firstly trained a model with slow GPU usage and less power consumption. I thought that it was the data cannot feed the GPU fast enough since there are some CPU intensive pre-processing for the dataset. Then I wrote a C++ extension for it which also use multithreads to speed up the process.

The C++ extension did speed up the throughput of the pre-processing, and the GPU usage went up and power draw also went much higher, but the total time consumption also went up.

My code is base on this repo: thstkdgus35/EDSR-PyTorch and, I am using Windows and change mutliprocess to mutlithread since mutliprocess is not available in Windows.

If you train so network, like RCAN, the GPU usage is very low and if I train SRCNN (7 layer of CNN and around 0.7M param) almost is no load.

I would like to know what happen, and why higher GPU usage and higher power draw cause longer time consumption.