Ctrl+c to stop distributed

Ardeal · July 12, 2021, 1:25am

Hi,

I tried 2 methods to start distributed:
method 1:
call torch.multiprocessing.spawn function to start n processes. on 1 computer with multi-GPUs

method 2:
call torch.distributed.launch to start n processes on 1 computer with multi-GPUs

if I used method 1, and used ctrl + c to stop code, sub-processing will not stop.
if I used mehod 2, and used ctrl + c to stop code, sub-processing will stop.

my questions are:

for method 1, how to stop sub-processing in python code?
for method 1, could the start code be run in python code?

#!/bin/bash
NUM_PROC=$1
shift
python3 -m torch.distributed.launch --master_port=44145 --nproc_per_node=$NUM_PROC train.py "$@"