LSTM error: Unable to classify sentences into classes

chrome · May 6, 2019, 5:35am

I am trying to classify sentences into multiple classes. Each sentence is converted into a tensor after creating word2index.

The input tensor is a sentence. After converting each word’s index to tensor, the sentence looks like this: tensor([1407995, 937957, 936279, 904725, 682273, 1291222, 523149, 566120, 913504])

The issue I am getting is ValueError: Expected input batch_size (9) to match target batch_size (1).
This I think is because my output(target label) is only a single label(whose index I have after creating word2index dictionary) but my input is actually the tensor value of each word in the sentence since the sentence length is also 9.
The error occurs at loss = loss_cf(label_pred, target_label)
where loss_cf = nn.NLLLoss()
The output of the fully connected layer after the LSTM is :
class_val = self.softmax(fc_layer)
Also (fc_layer): Linear(in_features=64, out_features=100, bias=True)

There are 100 different classes.
I am using an LSTM and I think the issue has to do with how I encode the target label which in this case is a single index value of the particular target label. I need help in figuring this out.

vdw · May 6, 2019, 9:23am

Without the full code, it’s difficult to pinpoint the exact problem. From what I can see, there are several issues that strike me as odd:

The error is about the batch size not the sequence length. Your example sequences has 9 values. Do you use batches for training and what is the shape?
LSTM has the setting batch_first=True/False (default: False). Depending on the chosen value, you have to make sure that your batch has the right shape. I assume that you have an Embedding layer as well.
Your values in your example tensor are very high. Do you really have a vocabulary of size 1,4 million words and more. how did you create your word2index. What us the shape of your embedding layer?

chrome · May 7, 2019, 8:16am

That issue got solved. I was supposed to use the output of the last lstm i.e.,lstm_out[-1]
The exact code statement is fully_conn(lstm_out[-1].view(-1, self.hidden_dim*2))
Hopefully this will be of help to someone.
Yes and to answer your questions, batch_first = True, batch_size=1.
Word2index was created using the entire vocabulary in the X_train. Each word had a dimension of128. I had used a pretrained embedding.