nn.CrossEntropyLoss using ignore_index decreases accuracy by 18%

beneyal · February 6, 2021, 3:22pm

Hi everyone,

As a toy project, I coded a model to classify sentences whether they are more likely written by Lewis Carroll or Mary Shelley, so I fed my model the sentences of “Alice in Wonderland” and “Frankenstein” with the correct lablels.

As with every sequence problem, I had to pad shorter sentences with 0 (I used a maximum length of 30). Now comes the weird part: when I used nn.CrossEntropyLoss(), I achieved an accuracy of 89% on the test set. With nn.CrossEntropyLoss(ignore_index=0), my accuracy on the test set fell to 71%. I can’t think to myself why this would happen.

Here’s my model for reference:

class BookClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim=100, hidden_size=100, num_layers=2, dropout_p=0.2):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
        self.bilstm = nn.LSTM(input_size=embedding_dim,
                              hidden_size=hidden_size,
                              num_layers=num_layers,
                              batch_first=True,
                              dropout=dropout_p,
                              bidirectional=True)
        self.fc = nn.Linear(2 * hidden_size, 2)
    
    def forward(self, x):
        embedded = self.embedding(x)
        out, _ = self.bilstm(embedded)
        return self.fc(out[:, -1, :])

I used optim.Adam(model.parameters()) as an optimizer, and the usual training loop.

Any insights are appreciated

ptrblck · February 7, 2021, 7:03am

Are you using valid word indices with the value 0?
If so, the loss function would not only ignore the padding values but also all words with the 0 index, which might decrease the model performance.

beneyal · February 7, 2021, 7:08am

No, my words2idx dictionary is {"<PAD>": 0, "<UNK>": 1, ... } and indices 2 and on are words from the training corpus.