Get stuck in clip_grad and optimizer.step using multi-GPU

46a9705b0a71168fc058 · May 18, 2019, 2:58pm

Hi, I meet a problem when train my model using multi-gpu on one node with nn.parallel.DistributedDataParallel. And the version of PyTorch is 0.4.1.
I use the following comment to run the program

export NGPU=3;
python -m torch.distributed.launch --nproc_per_node=$NGPU train.py

and it get stuck in clip_grad_norm_() and optimizer.step(). When I use nvidia-smi to check the utilization rate of the GPUs, all three of them are 100%.
However, when I set NGPU=1, the program can be executed correctly.
What’s the reason? How can I fix it? Thanks!