Interactive debug in distributed.launch

Hi, is there any interactive way to debugin distributed launch?
In pytorch 0.4.1, I use pdb in dataparallel, however, it seems that distributed launch would split the process into multiple copies. When I type sth in pdb, my input would also be split to different copies of processes, that’s not what I expected.

So I’m wondering if there is any method that can help me do interactive debugging in distributed launch?

1 Like

This happens because all processes share the same input file descriptor. When you type a character, the first process who reads it will get it. This makes interactive debugging almost impossible. What you can try, in lieu of a a proper solution, is close the input descriptor by running sys.stdin.close() on the ranks where you don’t want to run pdb.