Why my model does not train on LSTM?

I am currently working on a regression classification model. But my scores always keep steady, doesn’t decrease. Is there any problem with my model?


class LSTM(nn.Module): 
    def __init__(self):
        super(LSTM, self).__init__()

        embedding_dimension = 40
        pre_trained_embedding = torch.FloatTensor(TEXT.vocab.vectors)
        self.target_dimension = 1 # label: 1 or 0 
        self.embedding_layer = nn.Embedding.from_pretrained(pre_trained_embedding)
        self.hidden_dimension = 200 
        self.lstm = nn.LSTM(embedding_dimension, self.hidden_dimension, num_layers=1, batch_first=True)
        n_lstm_out = 200
        self.dense_layer = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(n_lstm_out, 100),
            nn.Dropout(0.5),
            nn.Linear(100, self.target_dimension, ),
            nn.Sigmoid(),
        )
        
        self.optimizer = optim.SGD(self.parameters(), lr=0.000001)
        self.loss = nn.MSELoss()


    def learn(self, x_train, y_train):
        self.optimizer.zero_grad()
        embedding_out = self.embedding_layer(x_train)
        lstm_out, _ = self.lstm(embedding_out)
        prediction_out = self.dense_layer(lstm_out)
        mse_input = prediction_out[:, 0, :]
        mse_input = mse_input.type(torch.float)
        mse_target = y_train
        mse_target = mse_target.type(torch.float)
        loss = self.loss(mse_input, mse_target)
        loss.backward()
        self.optimizer.step()

        return loss

You should be using a forward function. Refer to examples available on Pytorch forum and blogs online.

1 Like

I already use forward function, this is the summary of my model. Train steps worked without any error. But, loss score keep almost steady instead of decreasing. Could you see any inappropriate statement in my code for LSTM ?

With batch_first=True, the shape of lstm_out should be (batch_size, seq_len, num_directions * hidden_size). Since you use 1 layer, it should be (batch_size, seq_len, 200). This you push through a hidden layer, so prediction_out should be (batch_size, seq_len, 100).

That means that prediction_out[:, 0, :] takes the hidden state of the first time step for each sequence. You probably want to use the last time step, i.e., prediction_out[:, -1, :].

Since you throw all other hidden states, I’m not sure why you push all of the through the dense layer in the first place. I would simply do:

lstm_out, (h, c) = self.lstm(embedding_out)
prediction_out = self.dense_layer(h[-1])

With h[-1] being the last hidden state – last w.r.t. to the sequence as well as w.r.t. the last layer (in case you would have more than one). h[-1] should have the shape (batch_size, hidden_dim), like I assume you want.

There might be other issues with your code, but that stuck immediately out to me.

Thans a lott :)) I missed that part.