I am facing an issue with semaphore tracking while using pytorch multiprocessing over multiple GPUs. I get the following warning message multiple number of times every time and it is slowing down the code execution substantially (8 times slower).
/anaconda3/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
len(cache))
I didn’t face this issue a couple of days back and started facing it when I updated my PyTorch to 0.4.1 hence I downgraded it to 0.3.1 (hoping that it was my previous Pytorch version). But, I am still facing the warning and slowdown. I really need to understand the cause of this leakage and how can I fix it; my code is wrapped inside if name==‘main’ which is one of the solution commonly suggested for this warning.
I would really appreciate any suggestions on this. Please let me know what details should I provide.
I encountered a similar problem with multiprocessing and for loops in CPU. It’s not a fundamental solution, but -W option works for me, e.g. python -W ignore main.py. This makes training stable in my environment.
I tried using warnings module to ignore the warning, but it’s not worked. I think this is because semaphore trackers’ processes are generated internally.
my environment.: linux, python=3.7, pytorch-cpu=0.4.1, start_method=spawn
I run to the same issue with the warning (when running some preprocessing on GPU), except for now, I didn’t see a slow down in the running time of the code, yet.
python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
len(cache))
The message seems harmful, but pretty annoying since it is displayed whenever the multiprocessing takes place.
One way to hide this warning is to ignore it using the Python module warnings. Also, redirecting the stderr may be temporary solution.
However, warnings.filterwarnings() DOES not work for this particular warning (and probably for all the warnings of the Python module multiprocessing.semaphor_tracker(). I don’t really understand why. I tried different ways using regular expressions, but nothing seem to work.
Things that work:
python -W ignore your_script.py --> ignore ALL the warnings within your script. (not recommended).
python -W ignore:semaphore_tracker:UserWarning your_script.py or python -W 'ignore:semaphore_tracker:UserWarning' your_script.py seem to ignore exactly the above warning. It can also done by using the environment variable PYTHONWARNINGS:
If the issue was raised because you tried to use multiprocessing within cuda … It is better to ignore it as mentioned above, and move on (see here, and here).
I am actually also interested in understanding this error, since I have a similar one using tensorflow, and it uses a lot of memory in my case (too much for my workers which die).
I’m getting the same error. Only my latest run had 5779 leaked semaphores .
I’m not using PyTorch MP manually; fairseq is using it.
Fairseq master with PyTorch 1.9.
Same issue, I am getting 6 leaked semaphores, and am not using multiprocessing explicitly. The mind-boggling thing about this is that sometimes it happens, sometimes not, and I cannot find any pattern what leads to this. The memory doesn’t seem
I use PyTorch 1.9.0 and DGL 0.7.1 and I’m not sure which one is causing the warning to arise. Would greatly appreciate if anyone had any insight.
I got the same issue. It seems that the dataloader has to be constructed under the “if name==‘main’:” or in a function. Whenever I move it to a function in a class, it will have this issue…
I found simple example which leads to “UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown”.
I think it may be helpful to someone, as using barriers with multiprocessing should be pretty common
import torch.multiprocessing as mp
def foo(barrier = mp.Barrier(1)):
print("default barrier leads to leak")
# def foo(barrier = None):
# print("default None not leads to leak")
if __name__ == '__main__':
processes = []
bar = mp.Barrier(1)
mp.set_start_method('spawn', force=True)
for rank in range(1):
p = mp.Process(target=foo, args=(bar, ))
p.start()
processes.append(p)
for p in processes:
p.join()