ValueError: Expected input batch_size (10) to match target batch_size (5)

neuralpat · January 6, 2021, 1:53pm

Hi,

I’ve seen a lot of threads about similar issues, but unfortunately none of them helped me resolve mine.

I’m trying to generate news headlines, with a character-level RNN.
I have the following model:

class HeadlineModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(HeadlineModel, self).__init__()
        self.gru = nn.GRU(input_size, hidden_size)
        self.fc1 = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=2)

    def forward(self, data, hidden):
        x, h = self.gru(data, hidden)
        x = self.fc1(x)
        y_hat = self.softmax(x)

        return y_hat, h

My Input shape is 10x5x96. I got a sequence-length of 10, current batch-size is 5 and I got 96 characters overall that I’m one-hot-encoding. The output is of course the same size.

Here’s my training loop:

criterion = nn.NLLLoss()
optimizer = torch.optim.Adam(model.parameters())
h0 = torch.zeros(1, batch_size, hidden_size)

for sample, target in train_loader:

    optimizer.zero_grad()
    h = h0

    h = h.to(device)
    sample = sample.permute(1, 0, 2)
    sample = sample.to(device)
    target = target.to(device)

    y_hat, h = model(sample, h)
    loss = criterion(y_hat, target)
    loss.backward()
    optimizer.step()

My target size is 5x10. I read that NLLLoss() expects class indices (i.e. not one-hot-encoded), so, since my sequence-length is 10, for each batch I’m feeding it 10 such indices and hence the size of the target.

I get the following exception:

Traceback (most recent call last):
  File "H:/PycharmProjects/pythonProject/HeadlinerGenerator.py", line 117, in <module>
    loss = criterion(y_hat, target)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\loss.py", line 213, in forward
    return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\functional.py", line 2261, in nll_loss
    raise ValueError('Expected input batch_size ({}) to match target batch_size ({}).'
ValueError: Expected input batch_size (10) to match target batch_size (5).

Process finished with exit code 1

I don’t get why it thinks that the input batch-size is 10.
I’ve read the documentation for NLLLoss() a dozen times now but I’m not getting it.

What does my target need to look like?
Does it actually have to contain class-indices (which in my case are character-indices)?

I hope someone is willing to help me out here, I’m lost.

neuralpat · January 6, 2021, 6:50pm

@ptrblck I’m seeing that you’re being very helpfull around here. Thank you for that. May I bother you with this? Do you think you can help me out here? If not, that’s cool, I get it

ptrblck · January 7, 2021, 1:55am

The nn.GRU module expects the inputs as [seq_len, batch_size, features] as described in the docs by default, while other layers, such as nn.Linear, expect dim0 to be the batch dimension.
You could either permute the output of the GRU before passing it to self.fc1 or use batch_first=True while creating the nn.GRU module and pass the input as [batch_size, seq_len, features].

neuralpat · January 7, 2021, 6:00am

Thank you for repsonding.

This exception is thrown when calculating the loss though. You think what I’m passing to the loss function is correct, but the output of my network isn’t ?

I’ll try to change my code accordingly and see if it helps.

/edit: Sadly I’m still getting the exception, just a slightly different one:

ValueError: Expected target size (5, 96), got torch.Size([5, 10])

The Error makes sense to me. I get that I’m passing in values one-hot-encoded (96) as input, and I’m passing them as non-one-hot-encoded indices as target but from everything I read, I thought that’s what I’m supposed to be doing and so I have no clue how to resolve it.
Is this assumption not correct?

I’m really lost here.

/Edit:

Okay, I think I found the solution:

loss = criterion(y_hat.permute(0, 2, 1), target)

No more exception.

Is this correct ?
What happens to the third dimension of the input? Is it just ignored?

ptrblck · January 7, 2021, 7:36am

nn.CrossEntropyLoss for a mutli-class classification on a temporal sequence expects the model output in the shape [batch_size, nb_classes, seq_len] containing logits, while the target should be passed as [batch_size, seq_len] containing the class indices in the range [0, nb_classes-1].