Hi everyone,
As a toy project, I coded a model to classify sentences whether they are more likely written by Lewis Carroll or Mary Shelley, so I fed my model the sentences of “Alice in Wonderland” and “Frankenstein” with the correct lablels.
As with every sequence problem, I had to pad shorter sentences with 0 (I used a maximum length of 30). Now comes the weird part: when I used nn.CrossEntropyLoss()
, I achieved an accuracy of 89% on the test set. With nn.CrossEntropyLoss(ignore_index=0)
, my accuracy on the test set fell to 71%. I can’t think to myself why this would happen.
Here’s my model for reference:
class BookClassifier(nn.Module):
def __init__(self, vocab_size, embedding_dim=100, hidden_size=100, num_layers=2, dropout_p=0.2):
super().__init__()
self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
self.bilstm = nn.LSTM(input_size=embedding_dim,
hidden_size=hidden_size,
num_layers=num_layers,
batch_first=True,
dropout=dropout_p,
bidirectional=True)
self.fc = nn.Linear(2 * hidden_size, 2)
def forward(self, x):
embedded = self.embedding(x)
out, _ = self.bilstm(embedded)
return self.fc(out[:, -1, :])
I used optim.Adam(model.parameters())
as an optimizer, and the usual training loop.
Any insights are appreciated