Multiple Models in DataParallel: RuntimeError: arguments are located on different GPUs

The following is a traceback, and I think at the moment that the error is due to having to models - generator and discriminator applied with DataParallel separately leading to the following issue at D(G(z)).

What’s the way to fix this?

  File "/home/jerin/code/fairseq/fairseq/models/", line 207, in forward
    x = self.embed_tokens(src_tokens)
  File "/home/jerin/.local/lib/python3.5/site-packages/torch/nn/modules/", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jerin/.local/lib/python3.5/site-packages/torch/nn/modules/", line 110, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/jerin/.local/lib/python3.5/site-packages/torch/nn/", line 1110, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/
Exception ignored in: <bound method tqdm.__del__ of | epoch 000:   0%|                                                                                                                                                                                                                                                                                 | 0/3750 [00:08<?, ?it/s]>

I met the same problems, have you fixed it?

I was trying to switch models between GPUs. I solved this by putting all models in one module inheriting nn.Module and applying DataParallel on that. In the forward method definitions, I called/accessed the individual models (generator and discriminator).

OK, I only have one model. But with the same embedding Error. Thank you anyway.

I have the same error when I trying to use multiple embeddings

I also use the same one embedding and one model, but still the error, do you fix the problem?
I load the embedding from numpy array, but still has the error:

When I apply Dataparallel, I have the same problem with multiple model passing embedding weights across each other.