Pytorch 1.1.0 cpu usage is much higher than 1.0.1

I am using a code that consumes much more CPU resources on 1.0.1 than 1.1.0, while keeping the GPU usage at ~40% and same runtime on both.
Any idea why, where and what to look in order to make it consume less resources on 1.1.0?

In detail:

Originally the code was developed for 0.3.1 (link), and I upgraded it to 0.4.1 according to the migration guide.

Specifically:
(1) I changed all “volatile” usages to “with torch.nograd()”
(2) replaced “.data[0]” to “.item()”
(3) replaced all “.cuda()” to “.to(device)”.

The exact same code uses much more CPU resources on 0.4.1 or 1.0.1 vs 1.1.0
pytorch was installed on anacoda with the following command.
for v1.1.0: conda install pytorch==1.1.0 torchvision cudatoolkit=10.0 -c pytorch
for v1.0.1: conda install pytorch==1.0.1 torchvision cudatoolkit=10.0 -c pytorch
for v0.4.1: conda install pytorch==0.4.1 torchvision cudatoolkit=10.0 -c pytorch

I am not the one who developed the code, and I am new to pytorch (but experienced with TF). So I am unsure on how to debug this problem, what to look, or how to create a minimal example to reproduce it. I’ll be thankful for any guidance here.

EDIT

  1. Following the advice on https://github.com/pytorch/pytorch/issues/20311, setting OMP_NUM_THREADS=1 for 1.1.0 reduces CPU usage, but makes train time slower by 7.5% compared to 1.0.1 (and 16% compared to 0.4.1)

  2. For the same code, the total runtime of pytorch 1.0.1 is slower than 0.4.1 by ~8% [notes: (a) in this bullet, I refer to 1.0.1, not 1.1.0, (b) I repeated this measure 3 times and it was consistent, with low variance]