Cpu cores not working

jmaronas · June 5, 2018, 12:06pm

Hello.

I am running some experiments on pytorch with a titan xp. The problem is that pytorch only uses one core of CPU, even if I set n_workers=10 for example in a data loader. I installed pytorch with pip install and the version is 0.3.1.

lscpu gives:

Arquitectura: x86_64
modo(s) de operación de las CPUs:32-bit, 64-bit
Orden de bytes: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Hilo(s) de procesamiento por núcleo:2
Núcleo(s) por «socket»:4
Socket(s): 1
Modo(s) NUMA: 1
ID de fabricante: GenuineIntel
Familia de CPU: 6
Modelo: 60
Model name: Intel® Core™ i7-4790 CPU @ 3.60GHz

string111 · June 5, 2018, 1:26pm

Have you tried: torch.set_num_threads(num_cores)?

jmaronas · June 5, 2018, 1:38pm

yes, I actually realized about the problem because one of the parameters of my code is how many cores to use and I use that function to set them. However when running top on terminal the process only get up to 100% cpu usage.

string111 · June 5, 2018, 2:00pm

Check this out, I am not sure if PyTorch supports multiple CPU computation yet.

albanD · June 5, 2018, 2:09pm

Hi,

It is possible that even with 10 workers for dataloading you only use one core. That depends what processing you’re doing in the dataloader? If you don’t do any fancy preprocessing, then the workers just need to load the data which barely take any cpu.
If you’re performing heavy cpu computation (with large matrices) and you see a single core used, then you can use torch.get_num_threads() to check how many threads are available for compute in pytorch.

In your particular case, why do you believe cpu usage should be greater? What is the GPU usage?

jmaronas · June 5, 2018, 2:18pm

Well I put as example the n_workers but I refer to any parallelism that can be performed. In this case I do not use data preprocessing however computation is really heavy.

I made a checked training a convolutional variatonal autoencoder over mnist. I hoped that pytorch uses cpu parallelism at other level different than just dataloading. Is it that true? I do not now if at C level there are loop unrolls or stuff like that.

torch.get_num_threads() give me numbers greater than one.

albanD · June 5, 2018, 2:21pm

If you use the GPU, then no computations that really needs to be done by the cpu. You will see one core being used to run the python code + queue up cuda kernels but that’s it. Since python can use at most one thread at a time and cuda kernels launch are directed by python, only one core is used.

jmaronas · June 6, 2018, 7:15am

Well i cannot agree with you because sometimes the same code executed on other machine uses more than one core, Top output:

PID USUARIO PR NI VIRT RES SHR S %CPU %MEM HORA+ ORDEN
22854 jmarona+ 20 0 30,861g 1,379g 261656 R 279,0 8,9 2:32.98 python

Lscpu output:
Arquitectura: x86_64
modo(s) de operación de las CPUs:32-bit, 64-bit
Orden de bytes: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Hilo(s) de procesamiento por núcleo:2
Núcleo(s) por «socket»:2
Socket(s): 1
Modo(s) NUMA: 1
ID de fabricante: GenuineIntel
Familia de CPU: 6
Modelo: 158
Model name: Intel® Core™ i3-7100 CPU @ 3.90GHz

And in other machine:
lscpu output:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 94
Model name: Intel® Core™ i3-6100 CPU @ 3.70GHz

top output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24971 jmarona+ 20 0 16,284g 1,468g 298328 R 278,1 9,6 834:10.23 python

In this case n_workers is set to 0

albanD · June 6, 2018, 9:19am

Does the following code uses multiple cores?

import torch

a = torch.rand(100, 1000, 1000)
b = torch.rand(100, 1000, 1000)

while True:
    c = torch.bmm(a, b)

If so, try upgrading to 0.4 via pip to check that it is not a build problem from the old version.

John_Deterious · July 7, 2019, 8:27am

No. I just tried it.
But, sometimes I see that 100% of CPU is utilized, with other codes less than 2%, how do you explain that?

sina_mr · July 29, 2019, 2:43pm

Hi,

I have the same problem on a Debian server. Finally, I realized that the number of workers should be zero for the best performance, which contradicts the official guideline. Similar situation:

Best,
Sina

khalil · July 24, 2020, 6:56am

I met the same problem,I compile the source code with openmp and I found I can only use one core.Do you solve this problem?