Issue with multiprocessing semaphore tracking


#1

Hi

I am facing an issue with semaphore tracking while using pytorch multiprocessing over multiple GPUs. I get the following warning message multiple number of times every time and it is slowing down the code execution substantially (8 times slower).

/anaconda3/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
  len(cache))

I didn’t face this issue a couple of days back and started facing it when I updated my PyTorch to 0.4.1 hence I downgraded it to 0.3.1 (hoping that it was my previous Pytorch version). But, I am still facing the warning and slowdown. I really need to understand the cause of this leakage and how can I fix it; my code is wrapped inside if name==‘main’ which is one of the solution commonly suggested for this warning.

I would really appreciate any suggestions on this. Please let me know what details should I provide.

Thanks


(Michael Petrochuk) #2

Similarly, I have the same issue and i do not know where it is originating from! I’m using DataLoader with DistributedDataParallel.

Ditto with my name wrapped inside name==main


(Kobayashi Ittoku) #3

I encountered a similar problem with multiprocessing and for loops in CPU. It’s not a fundamental solution, but -W option works for me, e.g. python -W ignore main.py. This makes training stable in my environment.

I tried using warnings module to ignore the warning, but it’s not worked. I think this is because semaphore trackers’ processes are generated internally.

my environment.: linux, python=3.7, pytorch-cpu=0.4.1, start_method=spawn