I recently converted my code for a wasserstein-gp GAN from pytorch 0.4.1 to 1.0, and discovered that when using number of GPU’s > 1, the loss gradually increase to infinity. This only occurs when I’m using multiple GPU’s.
After some hours of debugging, I found out that it works fine if I change the optimizer to SGD when using multiple GPU’s, but multi-gpu does not work with Adam/Rmsprop.
I’m using the DataParallell for the GPU for both single and multi-gpu.
Is there anyone else experiencing this issue?