ValueError: Expected input batch_size (210) to match target batch_size (21)

I have seen this error referred here at discuss.pytorch.org, but the proposed solution did not fit my needs.

This is my first project with pytorch, i would start with something a little bit simpler and learning with smaller examples, but i really, really need to make this one project so i chose pytorch .

The goal: Have a text classifier to extract intents from sentences.

I started by using the word2vec from gensim library, and i think i imported correctly the weights , vocab and indexes .

Current situation/ Code

class IntentLSTM(nn.Module):
    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.5):    
        #Initialize the model by setting up the layers
        super().__init__()
        self.output_size  = output_size
        self.n_layers     = n_layers
        self.hidden_dim   = hidden_dim
        
        #Embedding and LSTM layers
        self.embedding  = nn.Embedding.from_pretrained(weights)
        self.lstm       = nn.LSTM(embedding_dim, hidden_dim, n_layers, dropout=drop_prob, batch_first=True)
        self.label      = nn.Linear(hidden_dim, output_size)
        self.dropout    = nn.Dropout(0.3)
        self.softmax    = nn.LogSoftmax(dim=1)
    
    def forward(self, x, hidden):
        #Embedding and LSTM output
        embedd           = self.embedding(x)
        lstm_out, hidden = self.lstm(embedd, hidden)
        lstm_out  = lstm_out.contiguous().view(-1, self.hidden_dim)
        out       = self.dropout(lstm_out)
        sig_out   = self.softmax(lstm_out)

        return sig_out, hidden

The following are the params used to instantiate a model and the Criterion defined:

vocab_size    = len(model.vocab)
output_size   = 4
embedding_dim = 300
hidden_dim    = 10
n_layers      = 2

net = IntentLSTM(vocab_size, output_size, embedding_dim, hidden_dim, n_layers)
for param in net.parameters():
     param.requires_grad = True


criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.001)

The reason for embedding_dim=300, is because each word from word2vec model has 300 features. The hidden_dim=10 is due to each sentence being 10 words (ints) long. The output_size=4 , as i have 4 possible classes (for classification, but maybe this is wrong too)

The problem occurs next (at cretireon function call) :

for e in range(epochs):
    # initialize hidden state
    h = net.init_hidden(batch_size)

    # batch loop
    for inputs, labels in train_loader:
        counter += 1

        h = tuple([each.data for each in h])
        net.zero_grad()

        output, h   = net(inputs.squeeze(), h)
        loss          = criterion(output.squeeze(), labels.long())  :frowning_face:
        loss.backward()
        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        nn.utils.clip_grad_norm_(net.parameters(), clip)
        optimizer.step()

The error:
ValueError: Expected input batch_size (210) to match target batch_size (21).

Now, i tried everything i’ve read, and i can’t seem to understand how and why Tensors’ sizes change, but the problem is that i guess (For now) .

This is a print log from forward function:

embedd.size():
torch.Size([21, 10, 300])
lstm_out after self.lstm(embedd,hidden)
torch.Size([21, 10, 10])
lstm_out after lstm_out.contiguous…
torch.Size([210, 10])
out after self.dropout(lstm_out)
torch.Size([210, 10])
sig_out after self.softmax(lstm_out)
torch.Size([210, 10])

Why is it that batch_size does not match target_size?

Thank you so very much for any help provided.

Cheers,Sousa.

The output of your lstm layer is [batch_size, seq_length, features].
In this line of code:

lstm_out  = lstm_out.contiguous().view(-1, self.hidden_dim)

you collapse the sequence length into the batch dimension, which creates the new shape of [210, 10].
Based on the error message it looks like your target has a batch size of 21, and I’m currently not sure, how your current use case is supposed to work.
Would you like your model to output only a single prediction for each sequence or one prediction for each time step?

I very much appreciate your answer @ptrblck

If i don’t collapse the sequence length into the batch dimension, i’m stuck with the tensor[21,10,10]

So, each batch contains 21 sentences, of 10 words (that are INTs) hence the [21,10];

If i avoid the “view” function, the output from forward will be a tensor of shape [21,10,10] and i get another error:

ValueError: Expected target size (21, 10), got torch.Size([21])

For a clear picture (Although the code is a frankenstein’s homage) here is the full code:
https://pastebin.com/WHnMU0k4

Cheers,
Sousa

Your explanation is correct, and the right “fix” depends on your use case.
E.g. if you would like to predict a class for each word, your target should also contain 210 labels.
Since it only contains 21, I’m not sure what your actual use case is and what your prediction represents.

I’m sorry for not clarifying that before.

What i am looking for, is to determine the correct label (Each label corresponds to a predefined intent) by feeding the LSTM with sentences.

So, if i feed it with a sentence like: [“do you serve lobster”], it should return the corresponding classification: “show_menu” (returning an INT, 2 in this case), (which is somewhat an intent for a simple chatbot). The labels’ tensor contains only values from 0 to 3 (4 possibilities) .

I don’t mean to bother you @ptrblck , as you seem very active through the board, and i commend you for all your dedication to helping out others here, but can you point me in the right direction on how to fix this issue?

I’ll explain a bit better the finality of the NN:

  • Feed it with a sentence (encoded) : [ 123,412,0,0,0,0,0,0,0,0] (Size 10, padded with zeroes)
  • Get a classification (numerical) for each sentence in the batch : 3 (3 corresponds to show_menu)

If rules allow it, i’ll be more than willing to pay to anyone who can help me solve the issue or donate to any cause of their choice