Arguments located on different GPUs error


I am getting “arguments are located on different GPUs” error when I try to run my code on multiple GPUs using nn.DataParallel for the following code snippet:

import torch.nn as nn

data = [{}, {}, {}, {}]
labels = [0, 0, 0, 0]

class LSTM(nn.Module):

    def __init__(self):
        super(LSTM, self).__init__()
        self.W = nn.Linear(60, 5)
        self.dropout_layer = nn.Dropout(p=0.3)
        self.softmax = nn.LogSoftmax()
        self.embeddings = nn.Embedding(50, 10)
        self.lstm = nn.LSTM(310, 60, 2)

    def embed_path(self, path):
        edge, count = path
        inputs = torch.Tensor([[edge]]).long().to(device)
        embed = torch.flatten(self.dropout_layer(self.embeddings(inputs)))
        output, _ = self.lstm(lstm_inp.view(-1, 1, 310))
        return output * count
    def forward(self, data):
        for el in data:
            if not el:
                el[0] = 1
        h = torch.Tensor([]).to(device)
        for path in data:
            lstm_output = self.embed_path(path).view(1,-1)
            probabilities = self.softmax(self.W(lstm_output))
            h =, probabilities.view(1,-1)), 0)
        return h

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

lstm = nn.DataParallel(LSTM()).to(device)
criterion = nn.NLLLoss()
optimizer = optim.Adam(lstm.parameters(), lr=0.001)

for epoch in range(3):

    outputs = lstm(data)
    loss = criterion(outputs, torch.LongTensor(labels))

    print ("Loss: ", loss.item())

(Please note that the above code snippet is an extremely simplified version of my original code, so some things may not be immediately clear. If so, please let me know and I would be happy to elaborate.)

Also, I did go through the post here. It has been recommended to avoid sending input to “device” in the forward method of the loop and do it in training loop instead. But for me, that is not possible since I am converting the input (which is a numpy.ndarray of dictionaries (or tuples, in the simplified code snippet)) to embeddings, which cannot be done in the training loop.

If I don’t send the resultant embeddings to “device”, I get an argument mismatch error (“Expected object of device type cuda but got device type cpu”). And if I do, I get the “arguments located on different GPUs error”. As a result, I am stuck and cannot figure out a way to resolve this.

Any help would be greatly appreciated!


Can I know what are the two arguments that throw up the arguments located on different gpu error? Printing the stack trace and all the lines that deal with pushing the variables and model to the gpu, will also immensely help.


Thanks for getting back to me!

Fortunately, I was able to resolve this issue on my own a few minutes ago. I simply shifted the training data-to-Tensor conversion bit to the training loop, outside the “forward” function whilst keeping everything else the same.

For others who may face the same/similar issues in the future, just a tip: try to shift the conversion of input data to the training loop and push all tensors created in the “forward” function to “device”, as usual.

Thanks again!