RuntimeError: Assertion `THCTensor_(checkGPU)(state, 3, dst, src, indices)' failed

I have trained a simple sequence to sequence model and saved the model in file. Now when I loaded the model and trying to do the testing, I am getting the following error.

RuntimeError: Assertion `THCTensor_(checkGPU)(state, 3, dst, src, indices)’ failed.

I am getting error at the following line.

embedded = self.embedding(input).view(1, 1, -1)

I am guessing the problem is not in that line but rather in saving and loading process. I am running the code in the same server, in the same GPU where I have trained the model.

Does anyone know what the error means? Any suggestion to resolve this issue?

i cant think of what is going on. Any chance you can give a script that reproduces this problem?

I don’t have a small script to share with you. But I am sharing the saving and loading function.

def save_model(model, loss, epoch, tag):
    snapshot_prefix = os.path.join(args.save_path, tag)
    snapshot_path = snapshot_prefix + '_loss_{:.6f}_epoch_{}'.format(loss, epoch), snapshot_path)
    for f in glob.glob(snapshot_prefix + '*'):
        if f != snapshot_path:

def load_model(tag):
    filename = os.path.join(args.save_path, tag)
    # model = torch.load(filename, map_location=lambda storage, location: storage.cuda(args.gpu))
    model = torch.load(filename)
    return model

Also sharing the first part of the evaluation function and the forward function of the encoder.

def evaluate(encoder, decoder, dictionary, sentence):
    """Generates word sequence and their attentions"""
    input_variable = helper.sequence_to_variable(dictionary, sentence)
    input_length = input_variable.size()[0]
    encoder_hidden = encoder.init_weights(1)

    encoder_outputs = Variable(torch.zeros(args.max_length, encoder.hidden_size))
    if args.cuda:
        encoder_outputs = encoder_outputs.cuda()

    for ei in range(input_length):
        encoder_output, encoder_hidden = encoder(input_variable[ei], encoder_hidden)
        encoder_outputs[ei] = encoder_output[0][0]

Forward function of the encoder: (in the very first line, I am getting the error)

def forward(self, input, hidden):
    """"Defines the forward computation of the encoder"""
    embedded = self.embedding(input).view(1, 1, -1)
    embedded = self.drop(embedded)
    output = embedded
    for i in range(self.n_layers):
        output, hidden = self.rnn(output, hidden)
        output = self.drop(output)
    return output, hidden

I have no problem in training and saving the model. Whenever I try to load the saved model, I am getting this error. The encoder and decoder is the same as described here -

I am stuck here for more than one day and tried different things to find out the reason but failed. Any help would be highly appreciated. Thanks.

I have updated pytorch and tried to run again. After updating, I am getting the following error.

RuntimeError: arguments are located on different GPUs at /py/conda-bld/pytorch_1490979338030/work/torch/lib/THC/generic/

Now, I guess the problem is: when I trained the model, I didn’t set any specific GPU and as a result, pytorch used all available GPUs and now when I am trying to load the trained model and run, its giving me this error.

Is my understanding correct? Isn’t it possible to add some support in Pytorch to avoid these kind of circumstances? OR, do we need to set (using CUDA_VISIBLE_DEVICES) the gpu devices while training and then load the trained model on the same GPU to do the testing?

The problem isn’t that pytorch uses all GPUs by default (it’s not true, everything is put on device 0 by default), but that you scattered some tensors on different GPUs and then tried to do operations on them. If you want to do that you need to insert the .cuda(destination) calls to move tensors to the appropriate device.

thanks for your reply. If by default everything is put on device 0, why I am getting that error? because while running, I didn’t set any cuda device specifically. By the way, I have two questions.

  1. Suppose I am running on two different devices (different GPUs - GeForce and Titan-X) and also using DataParallel as per pytorch examples, do I need to make sure that I run in same type of GPUs to avoid unusual problems?

  2. Say, I have trained my model in GeForce GPU and saved it using state_dict() and then loaded them model in a different GPU (Titan-x), am I going to face any problem?

Sometimes, I get some weird errors because of these GPU issues and I don’t know how to solve it. It would be very helpful, if anyone can write some notes/blogs on the problems that one can make in these cases.

No idea, you must be using more devices somehow.

  1. Yes it’s ok, but note that it will run at a pace of the slowest GPU.
  2. No, it should be alright.

RuntimeError: arguments are located on different GPUs
I come across the problem in decoder-seq2seq part of my project.I can not find where the problem is ,can you give me a hint?

decoder_input = inputs[:, :-1]
            decoder_output, decoder_hidden, attn = self.forward_step(decoder_input, decoder_hidden, encoder_outputs,
            indices = torch.arange(inputs.size(0))
            if torch.cuda.is_available():
                indices =
                probabilities =
            for t in range(1, probabilities.size(1)):
                inputs =

                probabilities[indices, t] = decoder_output[indices, t-1, inputs[indices, t].view(-1)].exp().view(-1)

here is the whole relevant code.
thank you sooooooo much!!

    def forward(self, inputs=None, encoder_hidden=None, encoder_outputs=None,
                    function=F.log_softmax, teacher_forcing_ratio=0, sample=False):
        ret_dict = dict()
        if self.use_attention:
            ret_dict[DecoderRNN.KEY_ATTN_SCORE] = list()

        inputs, batch_size, max_length = self._validate_args(inputs, encoder_hidden, encoder_outputs,
                                                             function, teacher_forcing_ratio)
        decoder_hidden = self._init_state(encoder_hidden)

        use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

        decoder_outputs = []
        sequence_symbols = []
        lengths = np.array([max_length] * batch_size)

        def decode(step, step_output, step_attn, sample=False):
            if self.use_attention:
            if sample:
                symbols = torch.multinomial(torch.exp(decoder_outputs[-1]), 1)
                probs = torch.exp(decoder_outputs[-1])[np.arange(symbols.size(0)), symbols.squeeze(1)]
                symbols = decoder_outputs[-1].topk(1)[1]

            eos_batches =
            if eos_batches.dim() > 0:
                eos_batches = eos_batches.cpu().view(-1).numpy()
                update_idx = ((lengths > step) & eos_batches) != 0
                lengths[update_idx] = len(sequence_symbols)
            if sample:
                return symbols, probs
            return symbols

        # Manual unrolling is used to support random teacher forcing.
        # If teacher_forcing_ratio is True or False instead of a probability, the unrolling can be done in graph
        if use_teacher_forcing:
            probabilities = torch.ones(inputs.size(0), max_length + 1)
            samples_sent = torch.ones(inputs.size(0), max_length + 1) * self.sos_id
            hiddens = torch.zeros(max_length + 1, 2, batch_size, self.hidden_size)
            hiddens[0] = decoder_hidden
            decoder_input = inputs[:, :-1]
            decoder_output, decoder_hidden, attn = self.forward_step(decoder_input, decoder_hidden, encoder_outputs,
            indices = torch.arange(inputs.size(0))
            if torch.cuda.is_available():
                indices =
                probabilities =
            for t in range(1, probabilities.size(1)):

                probabilities[indices, t] = decoder_output[indices, t-1, inputs[indices, t].view(-1)].exp().view(-1)

Are you using nn.DataParallel or a single GPU?
Which line of code throws this error?

Thank you sooo much .
yes. I add nn.DataParallel on my model.
the problem is at

probabilities[indices, t] = decoder_output[indices, t-1, inputs[indices, t].view(-1)].exp().view(-1)

the specific description of the problem is :

RuntimeError: arguments are located on different GPUs at /opt/conda/conda-bld/pytorch_1535490206202/work/aten/src/THC/generic/

encoder of the seq2seq part works well.the problem is in the decoder part. I guess input of the encoder is just one ,however it is not the same at the decoder.Here is the forward function of the seq2seq class.

   def forward(self, input_variable, input_lengths=None, target_variable=None,
                teacher_forcing_ratio=0, sample=False):
        encoder_outputs, encoder_hidden = self.encoder(input_variable, input_lengths)
        result = self.decoder(inputs=target_variable,

Thanks for the information!
You are currently pushing some tensors manually to a specific device:

if torch.cuda.is_available():
    indices =
    probabilities =

I’m not sure how device is defined, but I assume it’s the default GPU.
If you are using nn.DataParallel, you’ll get replicas on each GPU, so that the device won’t be static anymore inside your forward method.
Try to change each .to(device) call with a more generic one using the device of another parameter:

indices =
probabilities = ...

thank you sooo much.
as you said,
change the code

   indices =
probabilities =

I am a little curious and confused.the "probabilities = " also takes the variable “inputs” as input.should I add the following code.

inputs =

No, inputs should be already split among the GPUs, so you could alternatively use:

probabilities =

The problem is that you are creating new tensors without specifying the current device the model and data is placed on:

probabilities = torch.ones(inputs.size(0), max_length + 1)

To get the current device, you can use any input data’s device or some model parameters.

I am so sorry that I come across a new problem.Need your help.
I changed as following:

indices =
probabilities =

then, the coding runs. However I found that only gpu0’s GPU-Util is not zero.the others Memory-Usage is not zero but GPU-Util is zero.
so the data is copied in different gpus but the model is not? is there any order that can help me find if the model id copied already.
some details:
1)I set the device = torch.device('cuda:0') in the head of code.
2)I set the nn.Dataparallel like this,

actor = (nn.DataParallel(actor)).to(device)

then I use Generator.module.sample(context,reply, TF=TF)(sample is common seq2seq model)to get my results.
3)When i run the code I give the order"CUDA_VISIBLE_DEVICES=0,1,2,3 python"