Loss is not Decreasing rather always around constant

mus · July 6, 2020, 2:37am

I am using a Multilayer perceptron classifier with 3 features(30,000 samples making it 30000*3), 64 unit hidden and one output layer for a custom dateset. However, it’s a binary classification problem[0/1] but the problem is after each batch the loss is always around .68 (.683,.684,.685,.683 again) and it’s not decreasing at all. I tried different batch size and different learning rates as well. I also tried Adam optimizer, unfortunately nothing is working. [note: I am a newbie in pytorch]

Here is the code:

 class Dataset(torch.utils.data.Dataset):
      'Characterizes a dataset for PyTorch'
      def __init__(self, features, labels):
            'Initialization'
            self.features = features
            self.labels = labels
    
      def __len__(self):
            'Denotes the total number of samples'
            return len(self.labels)
    
      def __getitem__(self, index):
            'Generates one sample of data'
            # Select sample
    
            # Load data and get label
            X = self.features[index]
            y = self.labels[index]
    
            return X, y
        
    my_dataset = Dataset(mlp_X, mlp_Y)
    
    # mlp_X is a sorted 1D 3 features tensor
      # e.g. 
       # tensor([[ 5.3165,  3.1576,  1.3895],
       #  [ 9.0631,  4.6192,  2.6483],
       #  [ 7.3324,  5.0629,  2.8914],
       #  ...,
       #  [14.9732,  8.8509,  1.8414],
       #  [ 8.3197,  8.0620,  7.6988],
       #  [11.6138,  4.8711,  0.3390]], requires_grad=True)
       
    # mlp_Y contains the label
      # e.g. 
          # tensor([[1.],
          #         [1.],
          #         [1.],
          #         ...,
          #         [0.],
          #         [0.],
          #         [0.]])
    
    Dtrainloader = torch.utils.data.DataLoader(my_dataset, batch_size=4,
                                          shuffle=True)
    
  
    class Feedforward(torch.nn.Module):
        def __init__(self, input_size, hidden_size):
            super(Feedforward, self).__init__()
            self.input_size = input_size
            self.hidden_size  = hidden_size
            self.fc1 = torch.nn.Linear(self.input_size, self.hidden_size)
            self.relu = torch.nn.ReLU()
            self.fc2 = torch.nn.Linear(self.hidden_size, 1)
            self.sigmoid = torch.nn.Sigmoid()
            # self.softmax = torch.nn.Softmax()
            
        def forward(self, x):
            hidden = self.fc1(x)
            relu = self.relu(hidden)
            output = self.fc2(relu)
            output = self.sigmoid(output)
            return output
        
        
        
    model = Feedforward(3, 64)
    criterion = nn.BCELoss()
    optimizer = torch.optim.SGD(model.parameters(), lr = .01)
    
    for epoch in range(50):
        running_loss = 0.0
        for i, data in enumerate(Dtrainloader, 0):
            inputs, labels = data
            
            optimizer.zero_grad()
            # Forward pass
            outputs = model(inputs)
            # Compute Loss
            loss = criterion(outputs, labels)
           
            # print('Epoch {}: train loss: {:.4f}'.format(epoch, loss.item()))
            # Backward pass
            loss.backward()
            optimizer.step()
            
            # print statistics
            running_loss += loss.item()
            if i % 1500 == 1499:    # print every 1500 mini-batches
                print('[%d, %5d] loss: %.3f' %
                      (epoch + 1, i + 1, running_loss / 1500))
                running_loss = 0.0
            
            
    # Check Accuracy on trainging data. Very very poor only 56%. 
    
    correct = 0
    total = 0
    with torch.no_grad():
        for data in Dtrainloader:
            X, y = data
            outputs = model(X).round()
            # _, predicted = torch.max(outputs.data, 1)
            total += 4
            correct += (outputs == y).sum().item()
            
    print(total, correct)
    
    print('Accuracy: %d %%' % (
        100 * correct / total))

During all epochs and mini batches the loss is always like this:
[1, 1500] loss: 0.773
[1, 3000] loss: 0.691
[1, 4500] loss: 0.686
[1, 6000] loss: 0.687
[1, 7500] loss: 0.686
[2, 1500] loss: 0.686
…
…
[6, 4500] loss: 0.686
[6, 6000] loss: 0.684
[6, 7500] loss: 0.686
[7, 1500] loss: 0.687
[7, 3000] loss: 0.683
[7, 4500] loss: 0.682
[7, 6000] loss: 0.682
[7, 7500] loss: 0.686
It does not decrease below .68 and repeatedly shows this same loss. I highly appreciate your help

ptrblck · July 6, 2020, 8:10am

You could try to normalize your dataset, which should help the training.
If that doesn’t help, try to scale down the problem and overfit a small dataset of e.g. just 10 samples.
Once your model is able to learn these samples perfectly, you could try to scale up the use case again.

Also, I would generally recommend to remove the sigmoid activation and use nn.BCEWithLogitsLoss instead of nn.BCELoss, as this would yield additional numerical stability.