Hi, I am trying to use multi gpu while running my code. I tried with the code provided in pytorch documentation, but its not working.
Could anyone please look at this once?
The thing is I was able to run program in multiple gpu multiple node, using distributed data parallel. But the gradients were not collected as the accuracy and loss are all zero after the first epoch.
HI @shrutishrestha I looked into your code, if you want to use nccl pg, better set up the cuda device of each process to local rank via torch.cuda.set_device