Getting KeyError: 'torch.FloatTensor' when trying to train on GPU

Spider101 · April 15, 2017, 1:42pm

My code runs fine on the cpu but when I try to run it on the GPU, I get the following stack trace:

Traceback (most recent call last):
  File "run_experiments.py", line 123, in <module>
    train_model(model, train_loader, nb_batches, optimizer, criterion, **vars(args))
  File "run_experiments.py", line 54, in train_model
    predictions, _ = model.forward(inputs, targets[:, :-1, :])
  File "/net/if1/ab3cb/grad_stuff/vislang/project/Visual-Story-Telling/source_code/model_zoo/seq2seq.py", line 41, in forward
    _, context_vec = self.encoder(inputs, hidden_init)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/net/if1/ab3cb/grad_stuff/vislang/project/Visual-Story-Telling/source_code/model_zoo/encoder.py", line 36, in forward
    output, hidden_state = self.gru(inputs, hidden_init)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/nn/modules/rnn.py", line 91, in forward
    output, hidden = func(input, self.all_weights, hx)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/nn/_functions/rnn.py", line 327, in forward
    return func(input, *fargs, **fkwargs)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/autograd/function.py", line 202, in _do_forward
    flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/autograd/function.py", line 224, in forward
    result = self.forward_extended(*nested_tensors)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/nn/_functions/rnn.py", line 269, in forward_extended
    cudnn.rnn.forward(self, input, hx, weight, output, hy)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/backends/cudnn/rnn.py", line 239, in forward
    fn.hx_desc = cudnn.descriptor(hx)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/backends/cudnn/__init__.py", line 304, in descriptor
    descriptor.set(tensor)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/backends/cudnn/__init__.py", line 110, in set
    self, _typemap[tensor.type()], tensor.dim(),
KeyError: 'torch.FloatTensor'

My training code is as follows:

def train_model(model, train_loader, nb_batches, optimizer, criterion, **kwargs):

    running_loss = 0

    for epoch in range(kwargs["epochs"]):

        iters = 0
        for inputs, targets in tqdm(train_loader, total=nb_batches):

            #process the inputs from the data loader to make it compatible with
            #the pytorch graph
            inputs, targets = torch.from_numpy(inputs).float(), torch.from_numpy(targets).float()

            #convert to cuda tensors if cuda flag is true
            if torch.cuda.is_available:
                inputs, targets = inputs.cuda(), targets.cuda()

            inputs, targets = Variable(inputs), Variable(targets)
            #clear out the gradients buffer
            optimizer.zero_grad()

            predictions, _ = model(inputs, targets[:, :-1, :])
            loss = criterion(predictions, targets[:, 1:, :])
            loss.backward()
            optimizer.step()

            running_loss += loss.data[0]

            '''if iters % 10 == 0:
                print("Loss at {} iteration: {}".format(iters+1, running_loss/(iters+1)))'''

            if iters > nb_batches:
                break

            iters += 1

#define the model, optimizer and criterion
model = Seq2Seq(args.embed_size, args.hidden_size)

if torch.cuda.is_available():
    model = model.cuda()

optimizer = optim.SGD(model.parameters(), lr=args.lr)
criterion = nn.KLDivLoss()

train_model(model, train_loader, nb_batches, optimizer, criterion, **vars(args))

I have the model definition in this gist. Additionally I recently upgraded my pytorch version due to slow loading of the GPU by following this topic. I have just started using pytorch on the gpu so any help in figuring this out would be appreciated.

wasiahmad · April 15, 2017, 5:47pm

Did you check what is the type of hidden_init before executing the following line in encoder?

output, hidden_state = self.gru(inputs, hidden_init)

I can see you have converted the inputs to cuda tensor, so only thing that can cause the problem might be related to hidden_init (I guess).

Spider101 · April 16, 2017, 2:38pm

Thank for that tip. That gets rid of the keyerror but now I receive the following error (stack trace below):

Traceback (most recent call last):
  File "run_experiments.py", line 118, in <module>
    train_model(model, train_loader, nb_batches, optimizer, criterion, **vars(args))
  File "run_experiments.py", line 47, in train_model
    loss.backward()
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/autograd/variable.py", line 146, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/nn/_functions/linear.py", line 22, in backward
    grad_input = torch.mm(grad_output, weight)
TypeError: torch.mm received an invalid combination of arguments - got (torch.FloatTensor, torch.cuda.FloatTensor), but expected one of:
 * (torch.SparseFloatTensor mat1, torch.FloatTensor mat2)
      didn't match because some of the arguments have invalid types: (torch.FloatTensor, torch.cuda.FloatTensor)
 * (torch.FloatTensor source, torch.FloatTensor mat2)
      didn't match because some of the arguments have invalid types: (torch.FloatTensor, torch.cuda.FloatTensor)

I dont understand why the loss.backward() operation expects anything to be a non-cuda tensor when I have converted my model and associated parameters to cuda tensors. Am I missing something here?
PS: I have updated the gists to reflect the new code.

smth · April 16, 2017, 4:07pm

maybe you somewhere typecasted a Variable to cuda rather than the Variable.data to cuda

Spider101 · April 16, 2017, 4:35pm

I have changed my code so that I convert all tensors to cuda before wrapping them in Variables but now I am getting a different error (stack trace below):

Traceback (most recent call last):
  File "run_experiments.py", line 122, in <module>
    train_model(model, train_loader, nb_batches, optimizer, criterion, **vars(args))
  File "run_experiments.py", line 50, in train_model
    loss = criterion(predictions, targets[:, 1:, :])
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/nn/modules/loss.py", line 36, in forward
    return backend_fn(self.size_average, weight=self.weight)(input, target)
  File "/if1/ab3cb/miniconda3/envs/fast_torch/lib/python3.5/site-packages/torch/nn/_functions/thnn/auto.py", line 41, in forward
    output, *self.additional_args)
TypeError: FloatDistKLDivCriterion_updateOutput received an invalid combination of arguments - got (int, torch.FloatTensor, torch.cuda.FloatTensor, torch.FloatTensor, bool), but expected (int state, torch.FloatTensor input, torch.FloatTensor target, torch.FloatTensor output, bool sizeAverage)

I have edited the code to reflect the new changes

wasiahmad · April 16, 2017, 6:20pm

One quick question, when we typecast a Variable to cuda, I thought Variable.data is also casted to cuda but from your statement it doesn’t seem so. Can you explain why? Since Variable is like a wrapper on a tensor, if we cast a Variable, everything wrapped by the Variable should be casted as well (I guess).

smth · April 16, 2017, 9:11pm

I think you need:

criterion = nn.KLDivLoss().cuda()

(i have to check, but i think it has weights)

Spider101 · April 16, 2017, 9:33pm

Just tried that. Still getting the same error unfortunately!

smth · April 16, 2017, 9:34pm

if you have a script i can run (preferably small – 30 lines), i can debug it for you. Otherwise, idk, just go into pdb, break at places and see where the FloatTensor (not cuda.FloatTensor) is coming from…

Spider101 · April 16, 2017, 10:11pm

@smth I have updated the gist with a relatively small driver script to reproduce the error. Let me know if you run into any problems. Appreciate your help!

smth · April 30, 2017, 12:40pm

i tried to take a look at this today (sorry for the delay), but your gist is still missing model_utils.

Spider101 · April 30, 2017, 2:18pm

Yes, even I forgot to update the thread. I have solved the problem. I had a time distributed wrapper in the model_utils script. I had forgotten to port the tensors to cuda over there. Thanks everyone for all the help. Appreciate it!