LSTM: Validation accuracy not improving

Hi,

I know this problem have been addressed many times but I cannot find any answers so I’m trying again. I’m building a LSTM classifier to predict a class based on a text. The issue is that my validation accuracy stagnate around 35%.

I’m wondering if it’s my model or my data prepation which is not working. Does my model looks correct for you or do I miss something? Thanks !

class LSTMClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, batch_size):
        super(LSTMClassifier, self).__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.label_size = label_size
        self.num_layers = 1
        self.batch_size = batch_size

        self.emb = nn.Embedding(self.vocab_size, self.embedding_dim, padding_idx=padding_idx)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim)#,batch_first=True)
        self.fc = nn.Linear(hidden_dim, label_size)

    def zero_state(self, batch_size): 
        # implement the function, which returns an initial hidden state.
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_dim).cuda()
        c0 = torch.zeros(self.num_layers, batch_size, self.hidden_dim).cuda()
        return h0, c0
    
    def forward(self, x):
        x = self.emb(x)
        batch_size = x.size(1)
        h0, c0 = self.zero_state(batch_size)
        out, (hn, cn) = self.lstm(x, (h0, c0))
        out = self.fc(hn[-1, :, :])
        return out

Pro tip: You don’t have to intialize the hidden state to 0s in LSTMs. PyTorch does that automatically.

That network looks fine imo. Maybe try changing the embedding size, stacked layers, and input_size.

You’re passing the hidden layer from the last rnn output. Instead you can using the output value from the last time step. Or maybe average out all the hidden/output values.

What’s your train accuracy?

From what I can see from having a quick look at your code:

  • The shape of x after self.emb(x) should be (batch_size, seq_len, embed_dim)
  • So batch_size = x.size(1) gives you the wrong value
  • In your code you NOT use batch_first=True, but the batch is in fact the first dimension x (this is probably the reason why the network throws no error despite the wrong batch_size)

There are two approaches to do this

(1) batch_first=False

 def forward(self, x):
    batch_size = x.size(0)
    x = self.embed(x) # shape = (batch_size, seq_len, embed_dim)
    x = x.transpose(0, 1) # shape = (seq_len, batch_size, embed_dim)
    h0, c0 = self.zero_state(batch_size)
    out, (hn, cn) = self.lstm(x, (h0, c0))
    out = self.fc(hn[-1, :, :])
    return out

(2) batch_first=True

 def forward(self, x):
    batch_size = x.size(0)
    x = self.embed(x) # shape = (batch_size, seq_len, embed_dim)
    h0, c0 = self.zero_state(batch_size)
    out, (hn, cn) = self.lstm(x, (h0, c0))
    out = self.fc(hn[-1, :, :])
    return out

@theairbend3r

Thanks for the tips and the recommendations, my training error is also not increasing. Basically, around 20 epochs, i have the following performances and it will never go beyond:

Epoch 0 | Train loss 1.587 | Train acc 0.263 | Valid loss 1.575 | Valid acc 0.280
Epoch 27 | Train loss 1.073 | Train acc 0.553 | Valid loss 1.596 | Valid acc 0.357

I tried to change the HP but nothing help.

@vdw thanks for the solutions, I have done the changes but performances stay the same. I’ll dig into that, maybe my data are in the wrong shape.