Accuracy and Loss not changing regardless of model

Hey everyone! I am working on a binary classifier with simulated data. The dataset monitors COVID related symptoms. Originally the whole dataset was simulated, but then I found real-world data. However, I still needed to generate testing statuses, as these are not readily available to the public. Regardless, neither dataset seemed to be working with any of my models. The loss seemed to vary slightly between epochs but not in any significant way. And the accuracy stayed exactly the same the entire time. I am wondering whether this is an issue of me not knowing how to simulate data, implementing the Dataset class incorrectly, or how I am training the models. I don’t think it’s how I define my models because they learn and the loss changes appropriately when I run these on the MNIST data provided by PyTorch.

I’ve included all the relevant code below

#Loading the data
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader
class dataSet(Dataset):
    
    def __init__(self, data):
        self.X = torch.from_numpy(data[:, 0:-1]).float()
        self.Y = torch.from_numpy(data[:, -1]).float()
        self.samples = data.shape[0]
        
    def __getitem__(self, index):
        return self.X[index], self.Y[index]
        
    def __len__ (self):
        return self.samples

#data is a Pandas dataframe
train_dataset = dataSet(data.to_numpy())
test_dataset = dataSet(data.to_numpy())
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

#Function for training and evaluating
def train_model(model, n_epochs):    
    iteration = 0
    
    learning_rate = 0.001 
    optimizer = optim.Adam(baseline.parameters(), lr=learning_rate)
    criterion = nn.BCEWithLogitsLoss()
    for epochs in range(n_epochs):
        
        model.train()
        
        for inputs, targets in train_loader:

            # Make sure they accumulate gradients!
            inputs.requires_grad = True

            # Clear gradients w.r.t. parameters
            optimizer.zero_grad()

            # Forward pass to get the output/logits
            output = model(inputs)

            # Calculate Loss
            loss = criterion(output, targets.unsqueeze(1))

            # Get gradients w.r.t. the parameters
            loss.backward()

            # Update the parameters
            optimizer.step()

            iteration += 1

            # Every 200 iterations, check up on how the model is doing, 
            # by printing the loss and the training accuracy on the held out data. 
            # Accuracy = (number of test statuses correctly identified) / (total number of statuses)
            if iteration % 200 == 0:
                # Calculate Accuracy         
                correct = 0
                total = 0
                
                model.eval()
                
                # Iterate through test dataset
                for inputs, targets in test_loader:
                    # Load images with gradient accumulation capabilities
                    inputs.requires_grad = True

                    # Forward pass only to get logits/output
                    outputs = model(inputs)

                    # Get predictions from the maximum value
                    _, predictions = torch.max(outputs.data, 1)

                    # Get the total number of labels
                    total += targets.size(0)

                    # Calculate the total correct predictions
                    overlap = predictions.eq(targets)
                    correct += (overlap == True).sum().item()

                accuracy = 100 * (correct/total)

                # Print Loss
                print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iteration, loss.item(), accuracy))

Here is the resulting output when I run either of these models:

#Running the models
baseline = binaryClassifier(input_dim)
baseline

binaryClassifier(
  (linear): Linear(in_features=26, out_features=1, bias=True)
)

list(baseline.parameters())
[Parameter containing:
 tensor([[ 3.9646e-04, -4.4786e-04, -4.0808e-04,  3.5642e-04,  3.9394e-04,
          -3.8129e-04, -4.5752e-04,  2.2144e-04, -1.7427e-04,  7.4252e-05,
           2.8504e-04,  4.6275e-05,  7.3203e-06, -1.5663e-04, -5.1841e-05,
           1.5425e-04, -3.3701e-04,  2.2288e-04, -4.6276e-04,  1.0501e-04,
           1.7254e-04, -5.5671e-05, -1.9452e-04,  4.5088e-04, -1.3678e-04,
           4.0610e-04]], requires_grad=True), Parameter containing:
 tensor([-0.0816], requires_grad=True)]

train_model(baseline, epochs)
Iteration: 200. Loss: 0.08446517586708069. Accuracy: 95.11
Iteration: 400. Loss: 0.15068431198596954. Accuracy: 95.11
Iteration: 600. Loss: 0.1580568253993988. Accuracy: 95.11
Iteration: 800. Loss: 0.14487606287002563. Accuracy: 95.11
Iteration: 1000. Loss: 0.18062788248062134. Accuracy: 95.11
Iteration: 1200. Loss: 0.13961248099803925. Accuracy: 95.11
Iteration: 1400. Loss: 0.1084781289100647. Accuracy: 95.11
Iteration: 1600. Loss: 0.1737363040447235. Accuracy: 95.11
Iteration: 1800. Loss: 0.1317535638809204. Accuracy: 95.11
Iteration: 2000. Loss: 0.10618090629577637. Accuracy: 95.11

list(baseline.parameters())
[Parameter containing:
 tensor([[ 0.3363,  0.1596,  0.0758,  0.0787, -0.5132, -0.7288, -0.1051, -0.0859,
          -0.1446, -0.3361, -0.6885, -0.5501, -0.5627, -0.5177, -0.3476, -0.3683,
          -0.5116, -0.4581, -0.5300, -0.5626, -0.3942, -0.6555, -0.3099, -0.5286,
          -0.6720, -0.2889]], requires_grad=True), Parameter containing:
 tensor([-0.6178], requires_grad=True)]

model2 = FFNN(input_dim, 500, act_fn="relu", apply_dropout=True)
learning_rate = 0.001
optimizer = optim.Adam(model2.parameters(), lr=learning_rate)
criterion = nn.BCEWithLogitsLoss() 
train_model(model2, epochs)
Iteration: 200. Loss: 0.0984780415892601. Accuracy: 95.11
Iteration: 400. Loss: 0.24589982628822327. Accuracy: 95.11
Iteration: 600. Loss: 0.10863172262907028. Accuracy: 95.11
Iteration: 800. Loss: 0.05246538296341896. Accuracy: 95.11
Iteration: 1000. Loss: 0.130008727312088. Accuracy: 95.11
Iteration: 1200. Loss: 0.13363279402256012. Accuracy: 95.11
Iteration: 1400. Loss: 0.16624166071414948. Accuracy: 95.11
Iteration: 1600. Loss: 0.07791421562433243. Accuracy: 95.11
Iteration: 1800. Loss: 0.15000399947166443. Accuracy: 95.11
Iteration: 2000. Loss: 0.14199721813201904. Accuracy: 95.11

Are you dealing with an imbalanced dataset?
If so, could you check, if your model simply outputs the majority class?
If that’s the case, you could use a weighted loss or WeightedRandomSampler to create balanced batches.

Have a look at the Wikipedia on Accuracy Paradox for more information on this effect.

1 Like

Okay, so I implemented your code that you referenced here for balancing datasets with WeightedRandomSampler and it seems like it worked?

target train: 6361/339
batch index 0, 0/1: 25/25
batch index 1, 0/1: 24/26
batch index 2, 0/1: 28/22
batch index 3, 0/1: 23/27
batch index 4, 0/1: 22/28
batch index 5, 0/1: 29/21
batch index 6, 0/1: 21/29
batch index 7, 0/1: 17/33
batch index 8, 0/1: 20/30
batch index 9, 0/1: 19/31
batch index 10, 0/1: 25/25
batch index 11, 0/1: 29/21
batch index 12, 0/1: 23/27
batch index 13, 0/1: 27/23
batch index 14, 0/1: 26/24
batch index 15, 0/1: 28/22
batch index 16, 0/1: 24/26
batch index 17, 0/1: 24/26
batch index 18, 0/1: 22/28
batch index 19, 0/1: 26/24
batch index 20, 0/1: 22/28
batch index 21, 0/1: 23/27
batch index 22, 0/1: 28/22
batch index 23, 0/1: 26/24
batch index 24, 0/1: 27/23
batch index 25, 0/1: 28/22
batch index 26, 0/1: 22/28
batch index 27, 0/1: 22/28
batch index 28, 0/1: 21/29
batch index 29, 0/1: 23/27
batch index 30, 0/1: 27/23
batch index 31, 0/1: 19/31
batch index 32, 0/1: 26/24
batch index 33, 0/1: 30/20
batch index 34, 0/1: 22/28
batch index 35, 0/1: 21/29
batch index 36, 0/1: 23/27
batch index 37, 0/1: 28/22
batch index 38, 0/1: 29/21
batch index 39, 0/1: 27/23
batch index 40, 0/1: 25/25
batch index 41, 0/1: 28/22
batch index 42, 0/1: 22/28
batch index 43, 0/1: 25/25
batch index 44, 0/1: 29/21
batch index 45, 0/1: 20/30
batch index 46, 0/1: 24/26
batch index 47, 0/1: 29/21
batch index 48, 0/1: 27/23
batch index 49, 0/1: 28/22
batch index 50, 0/1: 24/26
batch index 51, 0/1: 26/24
batch index 52, 0/1: 24/26
batch index 53, 0/1: 24/26
batch index 54, 0/1: 25/25
batch index 55, 0/1: 26/24
batch index 56, 0/1: 25/25
batch index 57, 0/1: 25/25
batch index 58, 0/1: 30/20
batch index 59, 0/1: 26/24
batch index 60, 0/1: 24/26
batch index 61, 0/1: 22/28
batch index 62, 0/1: 24/26
batch index 63, 0/1: 25/25
batch index 64, 0/1: 30/20
batch index 65, 0/1: 31/19
batch index 66, 0/1: 20/30
batch index 67, 0/1: 26/24
batch index 68, 0/1: 27/23
batch index 69, 0/1: 20/30
batch index 70, 0/1: 28/22
batch index 71, 0/1: 25/25
batch index 72, 0/1: 28/22
batch index 73, 0/1: 28/22
batch index 74, 0/1: 26/24
batch index 75, 0/1: 25/25
batch index 76, 0/1: 23/27
batch index 77, 0/1: 27/23
batch index 78, 0/1: 27/23
batch index 79, 0/1: 29/21
batch index 80, 0/1: 19/31
batch index 81, 0/1: 28/22
batch index 82, 0/1: 26/24
batch index 83, 0/1: 22/28
batch index 84, 0/1: 26/24
batch index 85, 0/1: 23/27
batch index 86, 0/1: 25/25
batch index 87, 0/1: 29/21
batch index 88, 0/1: 19/31
batch index 89, 0/1: 26/24
batch index 90, 0/1: 25/25
batch index 91, 0/1: 27/23
batch index 92, 0/1: 26/24
batch index 93, 0/1: 30/20
batch index 94, 0/1: 24/26
batch index 95, 0/1: 22/28
batch index 96, 0/1: 24/26
batch index 97, 0/1: 21/29
batch index 98, 0/1: 30/20
batch index 99, 0/1: 25/25
batch index 100, 0/1: 28/22
batch index 101, 0/1: 24/26
batch index 102, 0/1: 24/26
batch index 103, 0/1: 28/22
batch index 104, 0/1: 25/25
batch index 105, 0/1: 22/28
batch index 106, 0/1: 23/27
batch index 107, 0/1: 25/25
batch index 108, 0/1: 28/22
batch index 109, 0/1: 26/24
batch index 110, 0/1: 27/23
batch index 111, 0/1: 27/23
batch index 112, 0/1: 33/17
batch index 113, 0/1: 26/24
batch index 114, 0/1: 27/23
batch index 115, 0/1: 29/21
batch index 116, 0/1: 25/25
batch index 117, 0/1: 28/22
batch index 118, 0/1: 22/28
batch index 119, 0/1: 23/27
batch index 120, 0/1: 25/25
batch index 121, 0/1: 17/33
batch index 122, 0/1: 28/22
batch index 123, 0/1: 24/26
batch index 124, 0/1: 22/28
batch index 125, 0/1: 23/27
batch index 126, 0/1: 26/24
batch index 127, 0/1: 26/24
batch index 128, 0/1: 24/26
batch index 129, 0/1: 27/23
batch index 130, 0/1: 22/28
batch index 131, 0/1: 23/27
batch index 132, 0/1: 27/23
batch index 133, 0/1: 25/25

However, when I run my models, it still simply outputs the majority class? Do you have any further thoughts?

train_model(baseline, epochs, learning_rate = 0.1)
Iteration: 200. Loss: 0.352510005235672. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 400. Loss: 0.29750311374664307. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 600. Loss: 0.4701906442642212. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 800. Loss: 0.3363111913204193. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 1000. Loss: 0.37919408082962036. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 1200. Loss: 0.3066049814224243. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 1400. Loss: 0.3903108239173889. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 1600. Loss: 0.36794665455818176. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 1800. Loss: 0.44550082087516785. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 2000. Loss: 0.3176323175430298. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 2200. Loss: 0.3094715476036072. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 2400. Loss: 0.3805839419364929. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 2600. Loss: 0.39165371656417847. Accuracy: 95.12121212121212
Sum of Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])

I also tried using a weighted loss, but I got too confused and made some mistakes in constructing that, so I went with WeightedRandomSampler instead

My best advice would be to try out some “reversely imbalanced” training runs, i.e. create more samples from the minority class in each batch, and try to force the model to overfit the minority class just for the sake of debugging.

1 Like

So I flipped all the class values to favor 1 rather than than 0, but this happened:

train_model(baseline, epochs, learning_rate = 0.1)
Iteration: 200. Loss: 0.19546955823898315. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 400. Loss: 0.3281453251838684. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 600. Loss: 0.26239436864852905. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 800. Loss: 0.4930112361907959. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 1000. Loss: 0.29870182275772095. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 1200. Loss: 0.43778884410858154. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 1400. Loss: 0.3238120973110199. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 1600. Loss: 0.29609280824661255. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 1800. Loss: 0.425942987203598. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 2000. Loss: 0.426102876663208. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 2200. Loss: 0.37764716148376465. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 2400. Loss: 0.41230279207229614. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])
Iteration: 2600. Loss: 0.29629307985305786. Accuracy: 4.878787878787879
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0])

It’s still overfit to the negative rather than positive. I’m not sure what this means

Sorry for not seeing it earlier, but _, predictions = torch.max(outputs.data, 1) won’t work if your output only contains a single output unit.
If your model is defined as:

binaryClassifier(
  (linear): Linear(in_features=26, out_features=1, bias=True)
)

you should get the prediction from the logits as preds = output > 0.0 in the default use case.
You could of course change the threshold later.

1 Like

I think that was (at least) the main issue! Accuracy is finally changing, not the numbers I was hoping for, but it’s a start. Now onto figuring out why it’s overfitting. Thank you!

Iteration: 200. Loss: 0.6329779624938965. Accuracy: 16.87818181818182
Iteration: 400. Loss: 0.6490057110786438. Accuracy: 17.78060606060606
Iteration: 600. Loss: 0.4974486827850342. Accuracy: 18.683030303030304
Iteration: 800. Loss: 0.48051244020462036. Accuracy: 22.292727272727273
Iteration: 1000. Loss: 0.44062167406082153. Accuracy: 18.683030303030304
Iteration: 1200. Loss: 0.5293588638305664. Accuracy: 20.48787878787879
Iteration: 1400. Loss: 0.4660516381263733. Accuracy: 20.48787878787879
Iteration: 1600. Loss: 0.3942776024341583. Accuracy: 23.195151515151515
Iteration: 1800. Loss: 0.49306976795196533. Accuracy: 30.414545454545454
Iteration: 2000. Loss: 0.4473477602005005. Accuracy: 24.097575757575758
Iteration: 2200. Loss: 0.4367488920688629. Accuracy: 19.585454545454546
Iteration: 2400. Loss: 0.4424794316291809. Accuracy: 28.60969696969697
Iteration: 2600. Loss: 0.49501246213912964. Accuracy: 24.097575757575758

In my case, I was facing the same error. On my laptop without GPU the training was fine. When I tried on GPU the model didn’t change the accuracy and loss after the first epochs. I was using nn.CrossEntropyLoss() with Adam.
Changing Adam with SGD worked for me.
I am sharing this, anyone may suffer from this.