Adam/RMSprop not working with pytorch 1.0 Multi-gpu?

I recently converted my code for a wasserstein-gp GAN from pytorch 0.4.1 to 1.0, and discovered that when using number of GPU’s > 1, the loss gradually increase to infinity. This only occurs when I’m using multiple GPU’s.

After some hours of debugging, I found out that it works fine if I change the optimizer to SGD when using multiple GPU’s, but multi-gpu does not work with Adam/Rmsprop.

I’m using the DataParallell for the GPU for both single and multi-gpu.

Is there anyone else experiencing this issue?