PyTorch changes the process affinity

I am running a simple while loop for inference on CPU. I want the inference to run only on 1 CPU, but it seems like PyTorch changes the process affinity. I can see from htop that jobs are being scheduled on multiple CPUs that I do not expect.

My python program.

import os
import torch
from torchvision.models import resnet50

os.sched_setaffinity(os.getpid(), {0})
print('Set PID-{} to affinity: {}'.format(os.getpid(), os.sched_getaffinity(os.getpid())))

net = resnet50()
net.eval()
while True:
    with torch.no_grad():
        net(torch.randn(1, 3, 224, 224))

Output from affinity setting and my htop monitoring.

Is this a bug or misuse from my side?

Hi,

By default, we use multi-threading to speed up cpu computations. Maybe with torch.set_num_threads(1) you will get the behavior you want?

Thanks for the reply. I have tried this approach, but unfortunately it still schedules the job on multiple CPUs.

Actually, your original code works fine for me:

I see. Would you like to share your CPU information by lscpu?

It’s a skylake running on a centos-like machine.

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              24
On-line CPU(s) list: 0-23
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           24
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               94
Model name:          Intel Core Processor (Skylake)
Stepping:            3
CPU MHz:             2394.477
BogoMIPS:            4788.95
Virtualization:      VT-x
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            4096K
L3 cache:            16384K
NUMA node0 CPU(s):   0-23
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat

I see. The only difference is the machine I am currently using has just 2 sockets, but more threads per core / more cores per sockets.

In detail, 2 sockets, 22 cores per socket, and 2 threads per core. Total of 88 CPU(s).

That is surprising…
Does setting the affinity works with other libraries like numpy ?

Yes, numpy works seamlessly with the affinity.

My simple program.

import os
import numpy as np

os.sched_setaffinity(os.getpid(), {0, 1, 2})
print('Set PID-{} to affinity: {}'.format(os.getpid(),
                                          os.sched_getaffinity(os.getpid())))

while True:
    a = np.random.random_sample([100, 100])
    b = np.random.random_sample([100, 100])
    c = np.dot(a, b)

Without setting affinity.

With affinity 0.

With affinity 0, 1, 2.

Interestingly, that does not work on my machine for numpy, and uses multiple cores and changes :stuck_out_tongue:

I guess some BLAS libraries might not be following this.
You can try running different ops in the while loop. Just the random numbers vs only the dot for example.

LOL, very interesting.

For numpy, here is a post about how to disable the OpenBLAS affinity resetting behavior. https://stackoverflow.com/questions/15639779/why-does-multiprocessing-use-only-a-single-core-after-i-import-numpy

But, even with this, my issue still persists with PyTorch.

Well, if you use a binary package, we ship with MKL :wink: So maybe a similar MKL flag is needed?

1 Like

MKL has extra handling for CPU parallelism. The following two things always worked for me to restrict MKL usage to 1 core:

1. taskset 0x1 python myscript.py
2. OMP_NUM_THREADS=1 python myscript.py
2 Likes