I’m using a very simple RNN-based binary classifier for short text documents. As far as I cant tell, it works reasonable fine. The loss goess down nicely and the accuracy goes up over 80% (it plateaus after 30-40 epochs, I’m doing 100). The forward
method of the classifier looks like this – the input batch X is sorted w.r.t. the their length but I don’t utilize it here:
def forward(self, X_sorted, X_length_sorted, method='last_step'):
X = self.word_embeddings(X_sorted)
X = torch.transpose(X, 0, 1)
X, self.hidden = self.gru(X, self.hidden)
X = X[-1]
# A series of fully connected layers
for l in self.linears:
X = l(X)
return F.log_softmax(X, dim=1)
Naturally, the length of sequences can vary between a minimum length of 5 and maximum length of 5. Now I wanted to see how the packing and padding of the sequences works. I therefore modified the forward
method as follows:
def forward(self, X_sorted, X_length_sorted, method='last_step'):
X = self.word_embeddings(X_sorted)
X = torch.transpose(X, 0, 1)
X = nn.utils.rnn.pack_padded_sequence(X, X_length_sorted)
X, self.hidden = self.gru(X, self.hidden)
X, output_lengths = nn.utils.rnn.pad_packed_sequence(X)
X = X[-1]
# A series of fully connected layers
for l in self.linears:
X = l(X)
return F.log_softmax(X, dim=1)
The network still still trains, but I’ve noticed some differences
- Each epoch takes about 10-15% longer to process
- The loss goes down much slower (using the same learning rate)
- The accuracy goes up to only about 70% (it plateaus after 30-40 epochs, I’m doing 100)
I also found to change nn.NLLLoss()
to nn.NLLLoss(ignore_index=0)
with 0
being the padding index. Again, it trains, but the loss goes down almost crazily fast (even with a much smaller learning rate) and the accuracy won’t change at all. I still somehow feel that the calculation of the loss is an issue.
In short, it kind of works in the sense that the networks train, but I fail to properly interpret the results Am I’m missing something here or are the expected results?