PyTorch changes the process affinity

jiashenC · February 22, 2020, 8:12pm

I am running a simple while loop for inference on CPU. I want the inference to run only on 1 CPU, but it seems like PyTorch changes the process affinity. I can see from htop that jobs are being scheduled on multiple CPUs that I do not expect.

My python program.

import os
import torch
from torchvision.models import resnet50

os.sched_setaffinity(os.getpid(), {0})
print('Set PID-{} to affinity: {}'.format(os.getpid(), os.sched_getaffinity(os.getpid())))

net = resnet50()
net.eval()
while True:
    with torch.no_grad():
        net(torch.randn(1, 3, 224, 224))

Output from affinity setting and my htop monitoring.

Screen Shot 2020-02-22 at 3.09.51 PM939×27 1.8 KB

Is this a bug or misuse from my side?

albanD · February 23, 2020, 6:57pm

Hi,

By default, we use multi-threading to speed up cpu computations. Maybe with torch.set_num_threads(1) you will get the behavior you want?

jiashenC · February 24, 2020, 3:08am

Thanks for the reply. I have tried this approach, but unfortunately it still schedules the job on multiple CPUs.

albanD · February 24, 2020, 3:02pm

Actually, your original code works fine for me:

jiashenC · February 24, 2020, 3:20pm

I see. Would you like to share your CPU information by lscpu?

albanD · February 24, 2020, 3:23pm

It’s a skylake running on a centos-like machine.

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              24
On-line CPU(s) list: 0-23
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           24
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               94
Model name:          Intel Core Processor (Skylake)
Stepping:            3
CPU MHz:             2394.477
BogoMIPS:            4788.95
Virtualization:      VT-x
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            4096K
L3 cache:            16384K
NUMA node0 CPU(s):   0-23
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat

jiashenC · February 24, 2020, 3:29pm

I see. The only difference is the machine I am currently using has just 2 sockets, but more threads per core / more cores per sockets.

In detail, 2 sockets, 22 cores per socket, and 2 threads per core. Total of 88 CPU(s).

albanD · February 24, 2020, 3:33pm

That is surprising…
Does setting the affinity works with other libraries like numpy ?

jiashenC · February 24, 2020, 3:46pm

Yes, numpy works seamlessly with the affinity.

My simple program.

import os
import numpy as np

os.sched_setaffinity(os.getpid(), {0, 1, 2})
print('Set PID-{} to affinity: {}'.format(os.getpid(),
                                          os.sched_getaffinity(os.getpid())))

while True:
    a = np.random.random_sample([100, 100])
    b = np.random.random_sample([100, 100])
    c = np.dot(a, b)

Without setting affinity.

With affinity 0.

With affinity 0, 1, 2.

albanD · February 24, 2020, 4:39pm

Interestingly, that does not work on my machine for numpy, and uses multiple cores and changes

I guess some BLAS libraries might not be following this.
You can try running different ops in the while loop. Just the random numbers vs only the dot for example.

jiashenC · February 24, 2020, 6:10pm

LOL, very interesting.

For numpy, here is a post about how to disable the OpenBLAS affinity resetting behavior. https://stackoverflow.com/questions/15639779/why-does-multiprocessing-use-only-a-single-core-after-i-import-numpy

But, even with this, my issue still persists with PyTorch.

albanD · February 24, 2020, 7:36pm

Well, if you use a binary package, we ship with MKL So maybe a similar MKL flag is needed?

Yaroslav_Bulatov · February 24, 2020, 7:51pm

MKL has extra handling for CPU parallelism. The following two things always worked for me to restrict MKL usage to 1 core:

1. taskset 0x1 python myscript.py
2. OMP_NUM_THREADS=1 python myscript.py

PyTorch changes the process affinity

Output from affinity setting and my htop monitoring. Screen Shot 2020-02-22 at 3.09.51 PM939×27 1.8 KB

Output from affinity setting and my htop monitoring.

Screen Shot 2020-02-22 at 3.09.51 PM939×27 1.8 KB