This is on the extreme end of weird software issues I ever had to deal with. Looking for help!
Consider this Python script:
import psutil
import cv2
def run():
curr_process = psutil.Process()
curr_process.cpu_affinity([3])
print('Before', curr_process.cpu_affinity())
import torch
print('After', curr_process.cpu_affinity())
import multiprocessing
multiprocessing.Process(target=run).start()
You would think that importing PyTorch has nothing to do with CPU affinity and the output before and after would be the same, but no. The output of this script on my machine is:
Before [3]
After [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
The issue, of course, arose in a big application and it took considerable effort to boil it down to this minimal example. I found that reproducing it requires a key condition: PyTorch and OpenCV should be imported from different processes. One before, and one after the fork. It does not reproduce if I import both in the child process, or in the main process, but it does reproduce if I swap cv2
and torch
in this script.
First of all, what does importing the library has to do with the CPU affinity? Perhaps this is some kind of optimization attempt by the authors of the library that went rogue?
Then, how do I fix something like this? I could, of course, import everything before fork as a quick hacky solution, but I’d like to get to the bottom of things.
My versions of libraries (both installed from Conda repo):
opencv=4.1.0=py37h3aa1047_6
pytorch=1.4.0=py3.7_cuda10.1.243_cudnn7.6.3_0
Also posted on SO: https://stackoverflow.com/questions/60963839/importing-opencv-after-importing-pytorch-messes-with-cpu-affinity