Cholesky high CPU utilization in GPU mode

Hello everyone,

I have some fun with comparison of linear algebra in tensorflow and pytorch. One strange thing I notice for cholesky is that when I use GPU mode in Pytorch - CPU (all cores) is still utilized heavily alongside 99% of GPU, meanwhile TensorFlow Cholesky doesn’t use CPU much. It is a bit of concern, because the project which I implemented significantly faster in TensorFlow, it is too early to say that it actually faster, but first stage of debugging, using pytorch bottleneck, showed that torch.potrf takes ~some time to compute. What Pytorch does so that it overloads all CPU cores? How can I alleviate this problem?

Here is some results using script from here https://gist.github.com/awav/5511f7fdf2a92d7b417ddb4269cb9127

GPU: Nvidia 1080 Ti, 11Gb.
CPU: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, 4 cores.
RAM: 32GB
Pytorch: 0.5.0a0+41c08fe
Python: 3.6.6
CUDA: 9.0
Magma: 2.3.0

> python bench_cholesky.py XXX 10000 [10|100]

|              | cholesky (avg. sec, 100 times) | cholesky + grads (avg. sec, 10 times) |
|--------------|-------------------------------:|--------------------------------------:|
|torch         | 1.31742e+00                    | 2.25207e+01                           |
|tensorflow    | 1.13558e+00                    | 2.00329e+01                           |

CPU & GPU for TensorFlow and Pytorch runs:

Actually, Pytorch used ~99% of GPU all the time with some fluctuation, when TensorFlow stayed 100% till the end of a test.

Thanks!

Alright, nobody answered, but I found a workaround to restrict the number of used threads setting torch.set_num_threads(1). It doesn’t affect performance. Question is why pytorch spawns other threads?