I was using Distributed DataParallel to train a model. I ran my code on two processes with two GPUs (one process one GPU). After I pressed ‘Ctrl + C’ in terminal, one process was shut down and the other one remains running. This can be observed by top
or nvidia-smi
command. So how to shut down all the processes in my terminal?
How to shut down all processes with 'Ctrl + C' when using PyTorch Distributed DataParallel training?
How did you launch the two processes? Did you use torch.distributed.launch
or some other mechanism?
@ pritamdamania87 Yes, I use python -m torch.distributed.launch
to run my code. And with Ctrl+C
to shut down the training, some processes are not closed. I must kill them manually.
This is an issue in many multiprocessing tasks, not just PyTorch. The best that you can do is iteratively calling ctrl+C until all processes are terminated.
1 Like