Hi,
I’ve seen a lot of threads about similar issues, but unfortunately none of them helped me resolve mine.
I’m trying to generate news headlines, with a character-level RNN.
I have the following model:
class HeadlineModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(HeadlineModel, self).__init__()
self.gru = nn.GRU(input_size, hidden_size)
self.fc1 = nn.Linear(hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=2)
def forward(self, data, hidden):
x, h = self.gru(data, hidden)
x = self.fc1(x)
y_hat = self.softmax(x)
return y_hat, h
My Input shape is 10x5x96. I got a sequence-length of 10, current batch-size is 5 and I got 96 characters overall that I’m one-hot-encoding. The output is of course the same size.
Here’s my training loop:
criterion = nn.NLLLoss()
optimizer = torch.optim.Adam(model.parameters())
h0 = torch.zeros(1, batch_size, hidden_size)
for sample, target in train_loader:
optimizer.zero_grad()
h = h0
h = h.to(device)
sample = sample.permute(1, 0, 2)
sample = sample.to(device)
target = target.to(device)
y_hat, h = model(sample, h)
loss = criterion(y_hat, target)
loss.backward()
optimizer.step()
My target size is 5x10. I read that NLLLoss() expects class indices (i.e. not one-hot-encoded), so, since my sequence-length is 10, for each batch I’m feeding it 10 such indices and hence the size of the target.
I get the following exception:
Traceback (most recent call last):
File "H:/PycharmProjects/pythonProject/HeadlinerGenerator.py", line 117, in <module>
loss = criterion(y_hat, target)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\loss.py", line 213, in forward
return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\functional.py", line 2261, in nll_loss
raise ValueError('Expected input batch_size ({}) to match target batch_size ({}).'
ValueError: Expected input batch_size (10) to match target batch_size (5).
Process finished with exit code 1
I don’t get why it thinks that the input batch-size is 10.
I’ve read the documentation for NLLLoss() a dozen times now but I’m not getting it.
What does my target need to look like?
Does it actually have to contain class-indices (which in my case are character-indices)?
I hope someone is willing to help me out here, I’m lost.