Any best pratice for debuging distributed pytorch training script?

Is there any best practice that you could debug a distributed training script train.py step by step, for example,

python -m torch.distributed.launch --nproc_per_node=2 train.py --input1 --input 2

I was using the Pycharm as IDE. Your help will be appreciated.