I’m trying to train a bidirectional LSTM for multi-label text classification. I’m using pre-trained w2v vectors to represent words. My input consists of indices to the word embeddings (padded with 0s), and lengths of sequences sorted in a decreasing order. My model looks like this:
class EmailLSTM(nn.Module): def __init__(self, input_size, hidden_size, num_classes, num_layers, dropout, weights): super(EmailLSTM, self).__init__() self.num_layers = num_layers self.hidden_size = hidden_size self.embedding = nn.Embedding.from_pretrained(weights, padding_idx=0) self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True, dropout=dropout) self.fc = nn.Linear(hidden_size * 2, num_classes) def init_hidden(self, batch_size): hidden = (torch.zeros(self.num_layers * 2, batch_size, self.hidden_size), torch.zeros(self.num_layers * 2, batch_size, self.hidden_size)) return hidden def forward(self, X, lengths): text_emb = self.embedding(X) hidden = self.init_hidden(X.shape) packed_input = pack_padded_sequence(text_emb, lengths, batch_first=True) packed_output, (h_n, _) = self.lstm(packed_input, hidden) forward_hidden = h_n # Forward hidden state at the last layer backward_hidden = h_n # Backward hidden state at the last layer combined_hidden = torch.cat((forward_hidden, backward_hidden), dim=1) logits = self.fc(combined_hidden) return logits
The loss function I’m using is BCEWithLogitsLoss, and I pass pos_weight parameter to it which is calculated according to docs. My issue is that the validation loss doesn’t seem to drop during training, so I’m not sure if my architecture / loss function are correct. Therefore, my question are:
- Is this the correct way to deal with variable-length inputs?
- Does my architecture look appropriate for the task?
- Is the loss function appropriate for the task?
Any help would be much appreciated!