Multi node, multi gpu system

Hi, I am trying to use multi gpu while running my code. I tried with the code provided in pytorch documentation, but its not working.
Could anyone please look at this once?
The thing is I was able to run program in multiple gpu multiple node, using distributed data parallel. But the gradients were not collected as the accuracy and loss are all zero after the first epoch.

Full code is presented here:

HI @shrutishrestha I looked into your code, if you want to use nccl pg, better set up the cuda device of each process to local rank via torch.cuda.set_device

Thanks for your reply @wanchaol, but when I tried to use nccl, it was giving me error, so I used gloo for this

What environment are you running your code? Because Gloo is for CPU training.

I am running my code in GPU. I will check if nccl works for GPU.