ValueError: Expected target size (128, 10000), got torch.Size([128, 1])

beneyal · November 12, 2018, 11:24pm

Hi everyone,

After two hours of debugging, I still can’t find the reason for the error I’m getting, ValueError: Expected target size (128, 10000), got torch.Size([128, 1])

I thought the code was pretty straightforward, and even resembles one of the tutorials, but still, an error.

The code is

class FeedForwardLM(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_size):
        super(FeedForwardLM, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.fc1 = nn.Linear(2 * embedding_dim, hidden_size)
        self.fc2 = nn.Linear(hidden_size, vocab_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, w1, w2):
        mw1 = self.embedding(w1)
        mw2 = self.embedding(w2)
        m = torch.cat([mw1, mw2], dim=2)
        out = torch.tanh(self.fc1(m))
        out = self.fc2(out)
        out = self.softmax(out)
        return out



dataset = IndexLMDataset(train)
data_loader = data.DataLoader(dataset, batch_size=128, shuffle=True)
losses = []
loss_fn = nn.NLLLoss()
model = FeedForwardLM(10000, 300, 512).to(device)
optimizer = optim.Adam(model.parameters())
for _ in range(100):
    total_loss = 0
    for batch in data_loader:
        (w1,w2), y = batch
        model.zero_grad()
        yhat = model(w1, w2)
        loss = loss_fn(yhat, y)
        loss.backward()
        optimizer.step()
        total_loss += loss
    losses.append(total_loss)
torch.save(model.state_dict(), "feedforward.model")
plt.plot(losses);

and the code for the dataset is:

class IndexLMDataset(data.Dataset):
    def __init__(self, train):
        self._trigrams = list(nltk.trigrams(train))[:1000]

    def __len__(self):
        return len(self._trigrams)

    def __getitem__(self, index):
        w1, w2, w3 = self._trigrams[index]
        w1 = torch.tensor([word2idx.get(w1, word2idx['UNK'])], dtype=torch.long).to(device)
        w2 = torch.tensor([word2idx.get(w2, word2idx['UNK'])], dtype=torch.long).to(device)
        w3 = torch.tensor([word2idx.get(w3, word2idx['UNK'])], dtype=torch.long).to(device)
        return (w1, w2), w3

where train is just a list of words, and nltk.trigrams(train) returns triplets of words, so nltk.trigrams(['The', 'quick', 'brown', 'fox']) returns [('The', 'quick', 'brown'), ('quick', 'brown', 'fox')].

Any help is appreciated, as always!

ptrblck · November 12, 2018, 11:42pm

The error message seems to be a bit strange.
It seems your target has the shape [batch_size, 1].
Could you remove dim1 using y = y.squeeze() and try it again?
Here is a simple dummy code where you can see the shapes:

vocab_size = 10000
model = nn.Sequential(
    nn.Linear(vocab_size, vocab_size),
    nn.LogSoftmax(dim=1)
)
criterion = nn.NLLLoss()

batch_size = 10
x = torch.randn(batch_size, vocab_size)
target = torch.randint(0, vocab_size, (batch_size, ))

print(x.shape)
> torch.Size([10, 10000])
print(target.shape)
> torch.Size([10])

output =model(x)
loss = criterion(output, target)
loss.backward()

beneyal · November 13, 2018, 7:47am

unsqueeze didn’t work…Doing y.view(-1) worked in a similar model, where the only difference was using one-hot vectors instead of embedding… Weird :\ It works if I’m not using batches, but that’s no way to work…

ptrblck · November 13, 2018, 11:44am

Oh I have a typo. I meant y.squeeze().
Could you try that again?
Sorry for the confusion. I’ll edit my post!

beneyal · November 13, 2018, 4:36pm

This time the error is Expected target size (128, 10000), got torch.Size([128]).

ptrblck · November 13, 2018, 4:46pm

Could you compare your code and shapes to this:

x = torch.randn(128, 10000, requires_grad=True)
output = F.log_softmax(x, dim=1)
target = torch.randint(0, 10000, (128, ))
criterion = nn.NLLLoss()
loss = criterion(output, target)

output is torch.Size([128, 10000]), while target is torch.Size([128]).

beneyal · November 13, 2018, 4:51pm

i think I got it! The shape of yhat was torch.Size([128, 1, 10000]) instead of torch.Size([128, 10000])! Squeezing yhat did the trick! Or at least I’m not getting an error anymore