Question about cuda for lstm

I read the tutorial about lstm.
I tried to use the cuda but failed. I do not know how to figure the bug.
Here is the code that I change.

model = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix), len(tag_to_ix))

loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# See what the scores are before training
# Note that element i,j of the output is the score for tag j for word i.

for epoch in range(10):  # again, normally you would NOT do 300 epochs, it is toy data

    for sentence, tags in training_data:
        # Step 1. Remember that Pytorch accumulates gradients.
        # We need to clear them out before each instance

        # Also, we need to clear out the hidden state of the LSTM,
        # detaching it from its history on the last instance.
        model.hidden = model.init_hidden()

        # Step 2. Get our inputs ready for the network, that is, turn them into
        # Variables of word indices.
        sentence_in = prepare_sequence(sentence, word_to_ix)

        targets = prepare_sequence(tags, tag_to_ix)
        # Step 3. Run our forward pass.
        tag_scores = model(sentence_in)

        # Step 4. Compute the loss, gradients, and update the parameters by
        #  calling optimizer.step()
        loss = loss_function(tag_scores, targets)
Traceback (most recent call last):
  File "", line 169, in <module>
    tag_scores = model(sentence_in)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "", line 121, in forward
    lstm_out, self.hidden = self.lstm(embeds.view(len(sentence), 1, -1), self.hidden)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/", line 162, in forward
    output, hidden = func(input, self.all_weights, hx)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/", line 351, in forward
    return func(input, *fargs, **fkwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/", line 284, in _do_forward
    flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/", line 306, in forward
    result = self.forward_extended(*nested_tensors)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/", line 293, in forward_extended
    cudnn.rnn.forward(self, input, hx, weight, output, hy)
  File "/usr/local/lib/python3.5/dist-packages/torch/backends/cudnn/", line 242, in forward
    fn.hx_desc = cudnn.descriptor(hx)
  File "/usr/local/lib/python3.5/dist-packages/torch/backends/cudnn/", line 310, in descriptor
  File "/usr/local/lib/python3.5/dist-packages/torch/backends/cudnn/", line 116, in set
    self, _typemap[tensor.type()], tensor.dim(),
KeyError: 'torch.FloatTensor'

From the trace it sounds like the hidden state of the LSTM is a CPU tensor. It should be a CUDA tensor for this to work.

In LSTMTagger,
you should make sure the hidden states are on the gpu as well by calling .cuda() on them.


Thank you very much. It’s very helpful.

OHH thank YOU !!! You saved me man.