Loss doesn't decrease, model analysis help!

Hi all,

I have built an LSTM model for binary class detection, and I am training with different hyperparameters but the loss just does not decrease.

I am feeding several time-series data through LSTM, then to predict true positive output[0,0,0,…1,0,0].

I am not too sure what else I am missing, if anyone could give me a piece of advice it will helpful

Thanks,

Here is my model,
` self.lstm = nn.LSTM(self.input_size, self.hidden_size, self.num_layers, dropout = self.drop_out, batch_first=True)

def forward(self, x):
    hidden, cell = self.init_hidden()
    out, (hn, cn) = self.lstm(x, (hidden, cell))
    return out[:,:,-1]

def init_hidden(self):
    weight = next((self.parameters())).data
    
    hidden, cell = (weight.new(self.num_layers, self.batch_size, self.hidden_size).zero_().to(self.device),
                    weight.new(self.num_layers, self.batch_size, self.hidden_size).zero_().to(self.device))
    return hidden, cell

and Training Loop

def train(self,input_size, batch_size,seq_length,hidden_size,num_layers,lr, epoch):

  self.model.train()

  for batch_idx, (data, target) in enumerate(self.train_loader):
      self.optimizer.zero_grad()
      data, target = data.to(self.device), target.to(self.device)
      predictions = self.model(data.float())
      self.globaliter += 1
      loss = self.criterion(predictions.float(), target.float())
      loss.backward()

      self.optimizer.step()

      pred = predictions.detach()
      pred[pred >0.5] = 1
      pred[pred<=0.5] = 0

      correct_indx = pred[target == 1]
      out_size = correct_indx[correct_indx == 1]
      accuracy = out_size.shape[0] / target.shape[0] 

      if batch_idx % self.log_interval == 0:
          print('\nTrain Epoch: {}\tLoss: {:.6f}\tAccuracy: {} '.format(epoch, loss.item(), accuracy * 100))
            
          with self.train_summary_writer.as_default():
              summary.scalar('loss', loss.item(), step=self.globaliter)
              summary.scalar('accuracy', accuracy, step=self.globaliter)

Could you give detailed implementation of your model. One possibility is the mismatch between the ground truth labels and the predictions.
But there is a high chance that you missed a linear layer to convert hidden output of LSTM to class prediction.

Try the following:


class Model(nn.Module):

    def __init__(self, input_size, output_size, hidden_size):
        super().__init__()

        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.linear = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        out, (hn, cn) = self.lstm(x)
        # out shape: [bs, seq, hidden_size]

        logits = self.linear(out)
        return logits

Hidden states are initialized by default.

@quanguet

Thank you for your reply.

I am not too sure what you mean by a detailed implementation. However, I missed out on giving my loss and optimizer function.

        self.model = Network(config).float()
        self.model = self.model.to(self.device)
        print(self.model)
        # Optimizer and Loss
        self.optimizer = optim.Adam(self.model.parameters(), lr=lr)
        self.pos_weight_factor = torch.tensor(120)
        # self.criterion = nn.BCELoss(reduction='none').to(self.device)
        self.criterion = nn.BCEWithLogitsLoss(pos_weight= self.pos_weight_factor)

What is the purpose of linear layer with hidden output? With BCEWithLogitsLoss I thought that I would need to put out as an input[predictions] to the loss = self.criterion(predictions.float(), target.float()). This is because BCEWIthLogisloss has intenal sigmoid function.

If you need any more information, please let me know

In fact, the out is the hidden state from self.lstm, we use this hidden state to make a prediction by forwarding it into the Linear layer. Even you only has one label, then the Linear layer should be nn.Linear(hidden_size, 1). On the other hands, when you extract out[:, :, -1], you only return one value of a hidden state. It may be not enough to learn anything.

@quanguet

Thank you for your explanation. Regards to your comment about out[:,:,-1], the full shape of out is fed to linear then logtis[:,:,-1] is extracted to the loss function. Will this still not be enough to learn anything?

 def forward(self, x):
        out, (hn, cn) = self.lstm(x)
        # out shape: [bs, seq, hidden_size]

        logits = self.linear(out)
        return logits[:,:,-1]

If so, what will be the best way to tackle this? as I only have one dimension of label. Will large dataset can solve the issue?

Thank you!

Since your model learned to predict the binary class. logits will have the shape of [batch_size, seq_len, 1] already. Indexing logits[:, :, -1] will basically return the same values but with different shape [batch_size, seq_len].

In order to forward these logits into some loss like nn.BCEWithLogitsLoss(), you may need to reshape the input of logits into [batch_size * seq_len], which can be done by

return logits[-1]