Loss doesn't decrease, model analysis help!

ykukkim · March 18, 2020, 6:53am

Hi all,

I have built an LSTM model for binary class detection, and I am training with different hyperparameters but the loss just does not decrease.

I am feeding several time-series data through LSTM, then to predict true positive output[0,0,0,…1,0,0].

I am not too sure what else I am missing, if anyone could give me a piece of advice it will helpful

Thanks,

Here is my model,
` self.lstm = nn.LSTM(self.input_size, self.hidden_size, self.num_layers, dropout = self.drop_out, batch_first=True)

def forward(self, x):
    hidden, cell = self.init_hidden()
    out, (hn, cn) = self.lstm(x, (hidden, cell))
    return out[:,:,-1]

def init_hidden(self):
    weight = next((self.parameters())).data
    
    hidden, cell = (weight.new(self.num_layers, self.batch_size, self.hidden_size).zero_().to(self.device),
                    weight.new(self.num_layers, self.batch_size, self.hidden_size).zero_().to(self.device))
    return hidden, cell

and Training Loop

def train(self,input_size, batch_size,seq_length,hidden_size,num_layers,lr, epoch):

  self.model.train()

  for batch_idx, (data, target) in enumerate(self.train_loader):
      self.optimizer.zero_grad()
      data, target = data.to(self.device), target.to(self.device)
      predictions = self.model(data.float())
      self.globaliter += 1
      loss = self.criterion(predictions.float(), target.float())
      loss.backward()

      self.optimizer.step()

      pred = predictions.detach()
      pred[pred >0.5] = 1
      pred[pred<=0.5] = 0

      correct_indx = pred[target == 1]
      out_size = correct_indx[correct_indx == 1]
      accuracy = out_size.shape[0] / target.shape[0] 

      if batch_idx % self.log_interval == 0:
          print('\nTrain Epoch: {}\tLoss: {:.6f}\tAccuracy: {} '.format(epoch, loss.item(), accuracy * 100))
            
          with self.train_summary_writer.as_default():
              summary.scalar('loss', loss.item(), step=self.globaliter)
              summary.scalar('accuracy', accuracy, step=self.globaliter)

quanguet · March 18, 2020, 7:06am

Could you give detailed implementation of your model. One possibility is the mismatch between the ground truth labels and the predictions.
But there is a high chance that you missed a linear layer to convert hidden output of LSTM to class prediction.

Try the following:


class Model(nn.Module):

    def __init__(self, input_size, output_size, hidden_size):
        super().__init__()

        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.linear = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        out, (hn, cn) = self.lstm(x)
        # out shape: [bs, seq, hidden_size]

        logits = self.linear(out)
        return logits

Hidden states are initialized by default.

ykukkim · March 18, 2020, 7:59am

@quanguet

Thank you for your reply.

I am not too sure what you mean by a detailed implementation. However, I missed out on giving my loss and optimizer function.

        self.model = Network(config).float()
        self.model = self.model.to(self.device)
        print(self.model)
        # Optimizer and Loss
        self.optimizer = optim.Adam(self.model.parameters(), lr=lr)
        self.pos_weight_factor = torch.tensor(120)
        # self.criterion = nn.BCELoss(reduction='none').to(self.device)
        self.criterion = nn.BCEWithLogitsLoss(pos_weight= self.pos_weight_factor)

What is the purpose of linear layer with hidden output? With BCEWithLogitsLoss I thought that I would need to put out as an input[predictions] to the loss = self.criterion(predictions.float(), target.float()). This is because BCEWIthLogisloss has intenal sigmoid function.

If you need any more information, please let me know

quanguet · March 18, 2020, 8:27am

In fact, the out is the hidden state from self.lstm, we use this hidden state to make a prediction by forwarding it into the Linear layer. Even you only has one label, then the Linear layer should be nn.Linear(hidden_size, 1). On the other hands, when you extract out[:, :, -1], you only return one value of a hidden state. It may be not enough to learn anything.

ykukkim · March 18, 2020, 9:01am

@quanguet

Thank you for your explanation. Regards to your comment about out[:,:,-1], the full shape of out is fed to linear then logtis[:,:,-1] is extracted to the loss function. Will this still not be enough to learn anything?

 def forward(self, x):
        out, (hn, cn) = self.lstm(x)
        # out shape: [bs, seq, hidden_size]

        logits = self.linear(out)
        return logits[:,:,-1]

If so, what will be the best way to tackle this? as I only have one dimension of label. Will large dataset can solve the issue?

Thank you!

quanguet · March 18, 2020, 9:46am

Since your model learned to predict the binary class. logits will have the shape of [batch_size, seq_len, 1] already. Indexing logits[:, :, -1] will basically return the same values but with different shape [batch_size, seq_len].

In order to forward these logits into some loss like nn.BCEWithLogitsLoss(), you may need to reshape the input of logits into [batch_size * seq_len], which can be done by

return logits[-1]