nn.NLLLoss() ValueError: dimension mismatch

Hi there,

I am working on a sentiment analysis project with the SST-1 dataset using the Torchtext library. As a baseline, I want to create a vanilla softmax classifier as a 1-linear-layer net with log-softmax and negative log-likelihood loss.

I read the doc of nn.NLLLoss(), and (I think) I understand what it does. Meaning that it requires a tensor of size [minibatch, classes] as the input and a tensor of size [classes] as the target to be able to categorize each batch item into one of the classes.

Now this is where I hit the wall. I computed the log probabilities with log_softmax and got a tensor of (minibatch, classes) (here, it has torch.Size([10, 6])) which I used as the input; then used the tensor of labels (torch.Size([10])) as the target.

However, I still got the following:
ValueError: Expected input batch_size (10) to match target batch_size (4).

Could you please point out where things went wrong and how could I fix it? Any help is greatly appreciated!

Below is the relevant code snippet:
(Python 3.5.6, PyTorch 1.0.0, torchtext 0.3.1)

TEXT = torchtext.data.Field()
LABEL = torchtext.data.Field(sequential=False, is_target=True)
train, val, test = torchtext.datasets.SST.splits(
    TEXT, LABEL, fine_grained=True)
TEXT.build_vocab(train)
LABEL.build_vocab(train)
train_iter, val_iter, test_iter = torchtext.data.BucketIterator.splits((train, val, test), batch_size=10, device=-1, repeat=False)

class Softmax(nn.Module):
    def __init__(self, vocab_size, n_classes, batch_size):
        super(Softmax, self).__init__()
        self.linear = nn.Linear(vocab_size, n_classes, bias = True)
        
    def forward(self, x):
        out = self.linear(x)
        log_probs = F.log_softmax(out, dim=1)
        return log_probs

def get_one_hot(batch, batch_size, vocab_size):
    new_tensor = torch.zeros(batch_size, vocab_size) 
    word_indices = torch.transpose(batch.text, 0, 1)
    for batch_item, word_ix in enumerate(word_indices):
        new_tensor[batch_item][word_ix] = 1
    return new_tensor

def train_softmax(train_iter):
    losses = []
    model = Softmax(VOCAB_SIZE, N_CLASSES, BATCH_SIZE)
    loss_fn = nn.NLLLoss()
    optimizer = optim.SGD(model.parameters(), LEARNING_RATE)

    for epoch in range(EPOCHS):
        epoch_loss = 0
        for batch in train_iter:
            
            x = Variable(torch.FloatTensor(get_one_hot(batch, BATCH_SIZE, VOCAB_SIZE)), requires_grad=True)
            y = Variable(batch.label)
            
            model.zero_grad()
            log_probs = model.forward(x)
            loss = loss_fn(log_probs, y) # here is the problem
            loss.backward()
            optimizer.step()

            epoch_loss += loss.item()
        losses.append(epoch_loss)
    return model, losses
1 Like

Can you mention the constant values too?

BATCH_SIZE, VOCAB_SIZE, N_CLASSES

Unrelated to this question, some minor suggestions:

  • Use log_probs = model(x) instead of log_probs = model.forward(x) to make sure the necessary hooks are in place.
  • In pytorch 1.0, you do not need to have Variable and use the tensors directly.

Yeah, sure:

BATCH_SIZE = train_iter.batch_size = 10
VOCAB_SIZE = len(TEXT.vocab) = 18282
N_CLASSES = len(LABEL.vocab) = 6

Actually, I tried to train it with different batch sizes, but when I plugged in anything greater than 4, the ValueError mentioned above occurred.

Thanks for the suggestions though!

Another thing I noticed:
if I try to include the subtrees in the training (that is, modifying the following line, while leaving everything else unchanged):

train, val, test = torchtext.datasets.SST.splits(
    TEXT, LABEL, train_subtrees=True, fine_grained=True)

I get the following:
ValueError: Expected input batch_size (10) to match target batch_size (2).

So I don’t know if the error is due to something peculiar to the SST dataset or torchtext, but I just couldn’t wrap my head around it… So please, if anyone encountered something similar, what could be the root of the problem?

UPDATE
As I was iterating over the training set, I realized that the last batch contains only 4 labels as opposed to the expected 10. Since it was the last batch, this was the value that the variable target.size(0) referred to after finishing the iteration, which ultimately caused the ValueError raise.

Take-home message: Know thy dataset inside out :slight_smile:

1 Like