Multi-layered bidirectional LSTM doesn't learn very well

Apart from all you question, I don’t see that you initialize the hidden state of you LSTM layer in each iteration; see this post. There should be something like:

for i, inputs in enumerate(X_train):
    # Initialize hidden layer
    model.hidden = model.init_hidden(batch_size)
    ...

and you model class having a method init_hidden like

def init_hidden(self, batch_size):
    return (torch.zeros(self.num_layers * self.directions_count, batch_size, self.rnn_hidden_dim).to(self.device),
            torch.zeros(self.num_layers * self.directions_count, batch_size, self.rnn_hidden_dim).to(self.device))

I just copied from my code, so you would need to adopt it to your requirements.