Backpropgation across multiple GPUs not working


I have a simple (but large) encoder-decoder network. I have 2 GPUs, and I have put the encoder on one GPU and the decoder on the other. I have done this using .cuda(0) and .cuda(1) on the various modules and Variables. The forward pass works perfectly.

However, when I call loss.backward() it blows up with:

Traceback (most recent call last):
  File "/home/mpeyrard/Workspace/nmt/", line 66, in <module>
    train_nmt(args.train, args.vocabulary)
  File "/home/mpeyrard/Workspace/nmt/", line 40, in train_nmt
  File "/home/mpeyrard/anaconda3/envs/nmt/lib/python3.6/site-packages/torch/autograd/", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/mpeyrard/anaconda3/envs/nmt/lib/python3.6/site-packages/torch/autograd/", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: arguments are located on different GPUs at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/THC/generated/../generic/

Is this supported?

I figured it out. Made a silly mistake. I was doing embedding lookups on both GPUs when the embeddings were obviously only on one. I was foolishly moving the embeddings between GPUs for the forward pass, but obviously, the backward pass was not doing this on my behalf, and therefore blowing up.