Very slow training on GPU for LSTM NLP multiclass classification

Hi Chris,

thank you .
I have checked and the time increases from batch to batch.

Regarding resetting the hidden state, there is a post on the Pytorch forum hidden cell state which references docs: nn.LSTM take your full sequence (rather than chunks), automatically initializes the hidden and cell states to zeros, runs the lstm over your full sequence (updating state along the way) and returns a final list of outputs and final hidden/cell state.
I have tried with adding

def forward(self, x, l):
        x = self.embeddings(x)
        x = self.dropout(x)

        # Initialize hidden state with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(device)
        # Initialize cell state
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(device)
        lstm_out, (ht, ct) = self.lstm(x, (h0.detach(), c0.detach()))
        lstm_out.size()
        lstm_out[:, -1, :]
        lstm_out = self.linear(lstm_out[:, -1, :])
        lstm_out.size()
        return self.linear(ht[-1])

Nevertheless, the time with every batch still increases.

What could be another reason for increasing time from batch to batch? I suppose that linked problem to this is that model has low training and validation accuracy (irrelevant varying around constant value)

Thank you