How to shut down all processes with 'Ctrl + C' when using PyTorch Distributed DataParallel training?

I was using Distributed DataParallel to train a model. I ran my code on two processes with two GPUs (one process one GPU). After I pressed ‘Ctrl + C’ in terminal, one process was shut down and the other one remains running. This can be observed by top or nvidia-smi command. So how to shut down all the processes in my terminal?

How did you launch the two processes? Did you use torch.distributed.launch or some other mechanism?

@ pritamdamania87 Yes, I use python -m torch.distributed.launch to run my code. And with Ctrl+C to shut down the training, some processes are not closed. I must kill them manually.

This is an issue in many multiprocessing tasks, not just PyTorch. The best that you can do is iteratively calling ctrl+C until all processes are terminated.

1 Like