RNN model: training loss does not decrease

I’m trying to train a simple RNN model (plain RNN or LSTM). My mode is this:

class Sequence(nn.Module):

    def __init__(self, input_dim, embedding_dim, hidden_dim, 
                 num_classes):
        super(Sequence, self).__init__()

        self.n_layers = 1
        self.hidden_size = hidden_dim

        self.embedding = nn.Embedding(input_dim, embedding_dim)

        self.encoder = nn.LSTM(embedding_dim, hidden_dim, 
                              num_layers=self.n_layers, dropout=0, 
                              batch_first=True)

        self.classify = nn.Linear(hidden_dim, num_classes)

    def forward(self, smi):
        smi = self.embedding(smi.long())

        smi, hidden = self.encoder(smi, hidden)

        out = self.classify(smi[:, -1, :])

        return out, hidden

    def init_weights(self):
        nn.init.xavier_uniform_(self.embedding.weight)

        nn.init.xavier_uniform_(self.classify.weight)
        nn.init.constant_(self.classify.bias, 0)

        for name, param in self.encoder.named_parameters():
            if 'weight' in name:
                nn.init.orthogonal_(param)

            elif 'bias' in name:
                nn.init.constant_(param, 0)
                r_gate = param[int(0.25 * len(param)):int(0.5 * len(param))]
                nn.init.constant_(r_gate, 1)

    def init_hidden(self, bsz, device):
        return (torch.zeros(self.n_layers, bsz, self.hidden_size).to(device), 
                torch.zeros(self.n_layers, bsz, self.hidden_size).to(device))

I don’t think I have to explain that much. The input is an array of integers (I encode each character of each sequence to an integer so that a string of N characters is encoded as an array of N integers): I pass it to an embedding layer, then an LSTM and finally a Linear layer for classification (it’s a multi-class classification task).

The problem is that the training loss is not decreasing, it just fluctuates a lot around the same value. I checked the weights and I cannot notice anything strange.

First, does anyone notice anything wrong in the model itself? It’s the first time I use these models and I’m not sure about the inputs. The input to the model has size torch.Size([batch_size, 70]), the output of forward is torch.Size([batch_size, num_classes]) and the labels have dimensions torch.Size([batch_size]).

Another thing I still don’t get is the hidden tensor, which is pass to an RNN model: is it useful or not? Why some people use it and others don’t?

I’m not sure why your loss would not decrease. Depending on your machine, and your initialization, LSTM can take a very long time to learn ! I remember once seeing no improvement for hours.

The hidden argument passed to LSTM is a tuple with the initial tensors (h, c). If you do not give this argument, they will be initialized to zero, which is fine.

1 Like

Lowering the learning rate, or using gradient clipping by a smaller value than 1.0 may help.