Neural Network not training, stuck at 50% accuracy

Fluffers · March 8, 2024, 10:38pm

Training loss remains relatively stable. On test set, it produces an accuracy of 50%, which is akin to the model guessing since it only has 2 classes. I already tried increasing/decreasing model complexity, adjusting hyperparameters, data augmentation, basically anything to get the model to underfit/overfit the data. I can’t tell if there is something wrong with the neural network or with the dataset itself.

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=5, padding=0)
        self.bn1 = nn.BatchNorm2d(32)  
        self.conv2 = nn.Conv2d(32, 64, kernel_size=5, padding=0)
        self.bn2 = nn.BatchNorm2d(64)  
        self.conv3 = nn.Conv2d(64, 128, kernel_size=5, padding=0)
        self.bn3 = nn.BatchNorm2d(128)   
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(128 * 12 * 8, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 2) 

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        x = self.pool(torch.relu(self.conv3(x)))
        x = torch.flatten(x, 1)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct_train = 0
    total_train = 0
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    _, predicted = torch.max(outputs.data, 1)
    total_train += labels.size(0)
    correct_train += (predicted == labels).sum().item()

    train_accuracy = 100 * correct_train / total_train
    print(f"Epoch {epoch+1}, Training Loss: {running_loss/len(train_loader)}, Training Accuracy: {train_accuracy}%")

Epoch 1, Training Loss: 0.6989821430408594, Training Accuracy: 52.94117647058823%
Epoch 2, Training Loss: 0.6789375381036238, Training Accuracy: 58.8235294117647%
Epoch 3, Training Loss: 0.6709084140531945, Training Accuracy: 88.23529411764706%
Epoch 4, Training Loss: 0.6927016901247429, Training Accuracy: 52.94117647058823%
Epoch 5, Training Loss: 0.6819337732864149, Training Accuracy: 64.70588235294117%
Epoch 6, Training Loss: 0.6968633731206259, Training Accuracy: 47.05882352941177%
Epoch 7, Training Loss: 0.6873575990850275, Training Accuracy: 52.94117647058823%
Epoch 8, Training Loss: 0.6847923795382181, Training Accuracy: 58.8235294117647%
Epoch 9, Training Loss: 0.683509703838464, Training Accuracy: 64.70588235294117%
Epoch 10, Training Loss: 0.6756617174004064, Training Accuracy: 52.94117647058823%

Accuracy on test set: 50.0%

J_Johnson · March 8, 2024, 10:56pm

Where are you defining the loss function and optimizer? Please copy that part of your code.

Fluffers · March 9, 2024, 2:08am

I define it just before the training loop.

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

J_Johnson · March 9, 2024, 3:21am

Print your labels to see what you’re working with.

You also might be better off using BCEwithlogitsloss. That’s specifically designed for two classes. In that case, you’ll need to frame your labels such that class 0 corresponds with a value of 0 and class 1 corresponds to a value of 1.

Fluffers · March 9, 2024, 5:18am

Here are my labels:

target_to_class = {v: k for k, v in ImageFolder(data_dir).class_to_idx.items()}
print(target_to_class)

{0: 'HCM_None', 1: 'HCM_Present'}

I tried using BCEwithlogitsloss but it created this error:

ValueError: Target size (torch.Size([32])) must be the same as input size (torch.Size([32, 2]))

Fluffers · March 9, 2024, 7:38am

I added an unsqueeze to try and get rid of the ValueError:

# Training the model
num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(images)
        labels = labels.unsqueeze(1).float() # unsqueeze
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
        predictions = torch.round(outputs)  
        correct += (predictions == labels).sum().item()
        total += labels.size(0)

    train_accuracy = correct / total
    print(f"Epoch {epoch+1}, Training Loss: {running_loss/len(train_loader)}, Training Accuracy: {train_accuracy}%")

The training loss and accuracy still remain the same though.

Epoch 1, Training Loss: 0.670424714232936, Training Accuracy: 0.5110470701248799%
Epoch 2, Training Loss: 0.6699145544659008, Training Accuracy: 0.457252641690682%
Epoch 3, Training Loss: 0.6695302724838257, Training Accuracy: 0.4217098943323727%
Epoch 4, Training Loss: 0.6671686551787637, Training Accuracy: 0.4505283381364073%
Epoch 5, Training Loss: 0.6690011150909193, Training Accuracy: 0.46301633045148893%
Epoch 6, Training Loss: 0.6681144020774148, Training Accuracy: 0.4783861671469741%
Epoch 7, Training Loss: 0.6691685246698784, Training Accuracy: 0.4303554274735831%
Epoch 8, Training Loss: 0.664765780622309, Training Accuracy: 0.5341018251681076%
Epoch 9, Training Loss: 0.6677677992618445, Training Accuracy: 0.42363112391930835%
Epoch 10, Training Loss: 0.6644004493048696, Training Accuracy: 0.4111431316042267%

Ruben_Band · March 9, 2024, 9:16am

Maybe post the dataset that you’re using as well? It can be the case that your model is too simple for the complex data samples it needs to learn.

J_Johnson · March 9, 2024, 10:36am

To frame it for BCE, you need the model output size to be 1.

However, it seems there may be some other issue as it should still work for crossentropyloss with dual classes.

DrBwts · March 14, 2024, 2:42pm

How much data do you have? What’s the percentage split between training & validation sets?

Have you tried training for more than 10 epochs?

Ruy_Diaz · March 15, 2024, 1:37am

What is your input size? What is the real size of your training samples? 12*8 ? Are you doing data augmentation?