Fluctuating Loss and Accuracy in CNN Classification Model

Hi all,

I am attempting to learn how to classify participants from the ABIDE dateset using PyTorch (a CNN) and fMRI data.

I have been playing around with this model that I found online. After a couple of weeks of troubleshooting I still can’t get it to work properly. I have tried changing all the hyper-parameters, different data, a different CNN model, and more (at one stage I re-coded everything but the data loader).

However; every-time I run the model it doesn’t appear to train and has fluctuating loss and accuracy:

See the following examples:

Epoch [1/50],
Loss: 54.1667, Accuracy: 50.0000
Epoch [2/50],
Loss: 53.1250, Accuracy: 66.6667
Epoch [3/50],
Loss: 52.7778, Accuracy: 50.0000
Epoch [4/50],
Loss: 52.6042, Accuracy: 33.3333
Epoch [5/50],
Loss: 51.6667, Accuracy: 50.0000
Epoch [6/50],
Loss: 51.3889, Accuracy: 50.0000
Epoch [7/50],
Loss: 51.1905, Accuracy: 16.6667
Epoch [8/50],
Loss: 51.3021, Accuracy: 33.3333
Epoch [9/50],
Loss: 51.3889, Accuracy: 66.6667
Epoch [10/50],
Loss: 51.4583, Accuracy: 50.0000

If anyone has some insight on what I could do I would really appreciate it!
I am also a bit of an amateur (only been working in this area for a year) so I really appreciate any input you can give.

Best,
Sam

Here is the code for the dataloader, model, and the training/testing procedure.

I feel like the problem might be with the dataloader or the way the data is put through the model. I have tried rewriting the training/testing procedure but had no luck. I have not been able to rewrite the dataloader yet or troubleshoot it as loading/transforming nifti files is complex and I am still learning.

Maybe this could be an over-fitting problem as the dateset only contains ~500 participants?

Okay I’m now thinking that it is because the data-loader is randomly selecting data, thus the model cannot train properly.

This is the original get item code for the random data-loader:

def __getitem__(self, index: int) -> Tensor:
        # just return a random patch
        global array_1
        path = np.random.choice(self.img_paths)
        img = nib.load(str(path))
        # going larger than max_idx will put us past the end of the array
        max_idx = np.array(img.shape) - np.array(self.patch_shape) + 1

        # Python has a `slice` object which you can use to index into things with the `[]` operator
        # we are going to build the slices we need to index appropriately into our niis with the
        # `.dataobj` trick
        slices = []
        for length, maximum in zip(self.patch_shape, max_idx):
            start = np.random.randint(0, maximum)
            slices.append(slice(start, start + length))
        array = img.dataobj[slices[0], slices[1], slices[2], slices[3]]

        if self.standardize:
            array_1 = np.copy(array)
            array_1 -= np.mean(array_1)
            array_1 /= np.std(array_1, ddof=1)
        return torch.Tensor(array_1)

I have tried altering it by changing path = np.random.choice(self.img_paths) to path = self.img_paths[index] and start = np.random.randint(0, maximum) to start = np.array(0, maximum).

However, I am still having the same problems.
image

Any suggestions for stopping the data-loader from being random are appreciated.

Using np.random.choice instead of the passed index sounds weird, but might still work assuming the target is still corresponding to the created image.
However, in your code snippet only the input is returned without a target tensor, so could you explain your classification use case a bit more as it doesn’t seem to depend on ground truth information?

Your current loss is also negative, which is usually wrong, so check which loss function is used and make sure the values of the model output (and potentially a target) are in the valid range.

Hi @ptrblck thank you so much for the reply!

Firstly, thanks for pointing out the problem with the negative loss. I found that my labels were coded as 1 and 2 instead of 1 and 0 (I found the issue discussed here). After changing my labels the loss values are now positive (see image below)
image

As for the ground truth and randomness, there should be a ground truth as I am trying to classify participants (diagnose control or Autism) using preexisting diagnostic labels and imaging data. So the fact that the data-loader doesn’t output the target data is indeed something I missed. I have now added in single_label = self.labels[index] and return (img_tensor, single_label) to the data-loader. The results now look a bit better but the accuracy and loss are still rather random(see image below).
image

Correct me if I’m wrong, but I think I may need to rewrite parts of the training/testing file now that the model returns two values?

If so, I will work on that tomorrow.
Once again thank you so much for your help!

I don’t know why the model should return two value or do you mean the Dataset and DataLoader?
If so, then yes you should use the corresponding target tensors returned by your DataLoader for training which would need some code changes.

Yes my mistake, I meant to say data-loader.

I have rewritten my data-loader to output the image tensors and labels. I have also simplified it a lot. For reference, it now reads as follows:

import pandas as pd
import numpy as np
from torch.utils.data import Dataset
import torch
import nibabel as nib
from pathlib import Path
from typing import List

class SamDataloader(Dataset):
    def __init__(self):
        #loading the labels from a csv file where the lables are the second colum with a header
        self.labels_df = pd.read_csv("ABIDE_FMRI_cleaned.csv")
        self.labels =  np.asarray(self.labels_df)
        self.labels = np.asarray(self.labels[:, 1])
        
        #laoding the nii paths from a file directory (directory is set in Spyder IDE)
        self.image_path: List[Path] = sorted(Path(__file__).resolve().parent.rglob("*.nii"))
        #setting the length fir the dataloader, this is nessisary but idk why
        self.data_len = len(self.labels)
    
    def __getitem__(self, index):
        #loading a single image using indexing and nibload
        single_image_path = self.image_path[index]
        single_image_nii = nib.load(single_image_path)
        single_image_array = single_image_nii.get_fdata()
        single_image_array = single_image_array.astype(np.float32)
        #reordering the image as torch preferes it in a different order
        single_image = single_image_array.transpose((3,2,1,0)) #not sure if this is the right order but we will see
        #converting the image to tensor
        single_image_tensor = torch.FloatTensor(single_image)
        
        #loading a single  label using the index in get item and making it a tensor for torch
        single_label = self.labels[index]
        label_temp = np.array(single_label)
        #label_as_tensor = torch.from_numpy(label_temp)
        
        #making the loader output the final image and label tensors for on participant (torch loops it for all participants)
        return (single_image_tensor, label_temp)
    
    def __len__(self):
        #returning the length defined in init
        return self.data_len

I also changed parts of the training/testing file to take two inputs from the data-loader as we discussed. This file now appears as follows:

import warnings
import torch
from torch.autograd import Variable
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from sklearn.model_selection import train_test_split
from torch.utils.data import Subset
from numpy import vstack
from sklearn.metrics import accuracy_score
import numpy as np
from matplotlib import pyplot as plt
from torch.utils.data.sampler import SubsetRandomSampler
from datetime import datetime

from SamDataloader import SamDataloader
from cnn_model_V2 import CNN_model

"""
Use "torch.cuda.empty_cache()" if you have the CUDA out of memory RuntimeError
"""

if __name__ == '__main__':
    learning_rate = 1e-3
    batch_size = 32
    num_epochs = 200
    validation_split = .2
    shuffle_dataset = True
    random_seed= 42
    num_classes = 2
    compose = transforms.Compose([
        transforms.ToTensor(),
        transforms.RandomHorizontalFlip(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])
    
    dataset = SamDataloader()
    

    device = torch.device("cuda")
    model = CNN_model().cuda()
    
    #I removed the part that added labels and image tensors together because my data loader already outputs them.

     # Creating data indices for training and validation splits:
    dataset_size = len(dataset)
    indices = list(range(dataset_size))
    split = int(np.floor(validation_split * dataset_size))
    if shuffle_dataset :
         np.random.seed(random_seed)
         np.random.shuffle(indices)
    train_indices, val_indices = indices[split:], indices[:split]
     
    # Creating PT data samplers and loaders:
    train_sampler = SubsetRandomSampler(train_indices)
    validation_sampler = SubsetRandomSampler(val_indices)
     
    train_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, 
                                                sampler=train_sampler)
    validation_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,
                                                     sampler=validation_sampler)

    def train_model(train_dl, model):
        criterion = torch.nn.BCELoss(reduction='mean').to(device)
        optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
        loss_array = []
        loss_values = []
        acc_values = []
        for epoch in range(num_epochs):
            for ind, (data, img_label) in enumerate(train_dl):
                inputs = data.permute(0, 1, 4, 2, 3), #reordered from original because my dataloader orders them differently

                inputsV, labelsV = Variable(inputs[0]), Variable(img_label)
                inputsV = inputsV.float() 
                labelsV = labelsV.float() 
                inputsV = inputsV.to(device) 
                labelsV = labelsV.to(device) 
                y_pred = model(inputsV)

                loss = criterion(y_pred.squeeze(), labelsV)
                running_loss = abs(loss.item())
                loss_array.append(running_loss)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
            avg_loss = np.average(loss_array)
            loss_values.append(avg_loss)
            acc = evaluate_model(validation_loader, model)
            acc = acc * 100
            acc_values.append(acc)
            now = datetime.now()
            current_time = now.strftime("%H:%M:%S")
            print("Time:", current_time, '\n' f'Epoch [{epoch + 1}/{num_epochs}], \n Loss: {abs(avg_loss):.4f}, Accuracy: {acc:.4f}')
        plot(acc_values, loss_values)

    def evaluate_model(test_dl, model):
        predictions, actuals = list(), list()
        accuracy_array = []
        for ind, (data, img_label) in enumerate(test_dl):
            inputs = data.permute(0, 1, 4, 2, 3), #reordered from original because my dataloader orders them differently

            inputsV, labelsV = Variable(inputs[0]), Variable(img_label)
            inputsV = inputsV.float() 
            labelsV = labelsV.float() 
            inputsV = inputsV.to(device)
            labelsV = labelsV.to(device) 
            
            yhat = model(inputsV) #added cpu conversion to get numpy to work

            actual = labelsV.cpu().numpy() #added cpu conversion to get numpy to work
            
            actual = actual.reshape((len(actual), 1))
            
            yhat = yhat.cpu()
            yhat = yhat.detach().numpy()
            yhat = yhat.round()

            predictions.append(yhat)
            actuals.append(actual)
            predictions, actuals = vstack(predictions), vstack(actuals)

            acc = accuracy_score(actuals, predictions)
            accuracy_array.append(acc)
            print( 'Accuracy:', acc*100)

            return acc


    def plot(x, y):
        plt.figure(figsize=(16, 5))
        plt.xlabel('EPOCHS')
        plt.ylabel('LOSS/ACC')

        plt.plot(x, 'r', label='ACCURACY')
        plt.plot(y, 'b', label='LOSS')
        plt.legend()
        plt.show()

    train_model(train_loader, model)
    evaluate_model(validation_loader, model)

However, I am still having the same problems with the loss and accuracy as seen by the output below.


I’m thinking that maybe I need to try things like giving the CNN more layers or rewriting the training/testing file but I’m really not sure . Do you have any suggestions on how to proceed?

I would recommend to try to overfit a small dataset (e.g. just 10 samples) by playing around with some hyperparameters of your model. Once this is done, you could try to scale up the use case again. If you cannot overfit on this tiny data, your training script might have more issues.

Hi @ptrblck,

Thank you so much for your assistance!

I followed your recommendation and found that the model was busted and only outputting 0’s. After some review it seems that I fundamentally misunderstood the difference between creating a binary classification and multi-class classification model. Specifically, I was trying to use “One Hot Encoded” data with a softmax activation layer, and BCEwithlogits loss. I was able to run the model after setting the model to one class while using a sigmoid activation layer and BCE loss.

At the moment the training looks as follows:


I just wanted to inquire about the accuracy. It is very blocky and gives poor results (chance-like accuracy). I have had success getting the loss to smoothly decline; however, I haven’t been able to get the accuracy to gradually and smoothly increase as I’ve seen in other’s results. Do you have any suggestions for how to approach this problem? Or is it more a matter of tweaking hyper-parameters?

At the moment I find that I can stop the accuracy from being so jumpy by increasing the batch size to 128 (see accuracy figure above) and it gradually increases with a learning rate of 0.001. If I change the learning rate the accuracy either decreases (0.01) or remains constant (0.0001). maybe I need a more complex model, momentum, or even data augmentation (I currently have 480 samples)? I am unsure.

I will reattach my code for my model and below for those observing this post so they can see the changes I’ve made.

Dataloader: Dataloader_V2 - Pastebin.com

Model:

import torch
from torch import nn

import torchmetrics


class CNN_model(nn.Module):
    def __init__(self):
        super(CNN_model, self).__init__()

        self.convolutional_layer = nn.Sequential(
            nn.Conv3d(in_channels=8, out_channels=16, kernel_size=(8, 8, 8), dilation=(3, 3, 3), stride=(2, 2, 2),
                      padding=(0, 0, 0)),
            nn.BatchNorm3d(16),
            nn.ReLU(),
            nn.Conv3d(in_channels=16, out_channels=24, kernel_size=(8, 8, 8), dilation=(2, 2, 2), stride=(2, 2, 2),
                      padding=(0, 0, 0)),
            nn.BatchNorm3d(24),
            nn.ReLU(),
            nn.Conv3d(in_channels=24, out_channels=32, kernel_size=(3, 3, 6), dilation=(1, 1, 1), stride=(1, 1, 1),
                      padding=(0, 0, 0)),
            nn.BatchNorm3d(32),
            nn.ReLU()
        )

        self.linear_layer = nn.Sequential(
            nn.Linear(in_features=32, out_features=24),
            nn.BatchNorm1d(24),
            nn.ReLU(),
            nn.Linear(in_features=24, out_features=16),
            nn.BatchNorm1d(16),
            nn.ReLU(),
            nn.Linear(in_features=16, out_features=1))
             
        #self.accuracy = torchmetrics.Accuracy()

    def forward(self, x):
        x = self.convolutional_layer(x)
        x = torch.flatten(x, 1)
        x = self.linear_layer(x)

        x = torch.sigmoid(x) # must have dim =1 for softmax https://discuss.pytorch.org/t/bceloss-vs-bcewithlogitsloss/33586
        return x

Training/Testing file:

import torch
import torchvision.transforms as transforms
import numpy as np

from torch.autograd import Variable
from numpy import vstack
from sklearn.metrics import accuracy_score
from matplotlib import pyplot as plt
from torch.utils.data.sampler import SubsetRandomSampler

from dataloader import RandomFmriDataset
from model import CNN_model

"""
Use "torch.cuda.empty_cache()" if you have the CUDA out of memory RuntimeError
"""

if __name__ == '__main__':
    #Defining hyperparameters
    learning_rate = 1e-3  #1e-2 results in decreseaing accuracy and 1e-4 in constant accuracy.
    batch_size = 128 #higher batch size makes accuracy the same or smoothes it out. https://medium.com/mini-distill/effect-of-batch-size-on-training-dynamics-21c14f7a716e
    num_epochs = 200
    validation_split = .2
    shuffle_dataset = True
    random_seed= 42
    num_classes = 1
    compose = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ]) #took out transforms.RandomHorizontalFlip(),
    
    #Defining key dependencies
    dataset = RandomFmriDataset()
    device = torch.device("cuda")
    model = CNN_model().cuda()
    

    #Creating data indices for training and validation splits:
    dataset_size = len(dataset)
    indices = list(range(dataset_size))
    split = int(np.floor(validation_split * dataset_size))
    if shuffle_dataset :
         np.random.seed(random_seed)
         np.random.shuffle(indices)
    train_indices, val_indices = indices[split:], indices[:split]
     
    #Creating the data samplers and loaders:
    train_sampler = SubsetRandomSampler(train_indices)
    validation_sampler = SubsetRandomSampler(val_indices)
     
    train_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, 
                                                sampler=train_sampler)
    validation_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,
                                                     sampler=validation_sampler)
    #Model training
    def train_model(train_dl, model):
        criterion = torch.nn.BCELoss(reduction='mean').to(device)
        optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
        loss_array = []
        loss_values = []
        acc_values = []
        for epoch in range(num_epochs):
            for ind, (data, img_label) in enumerate(train_dl):
                inputs = data.permute(0, 1, 4, 2, 3),

                inputsV, labelsV = Variable(inputs[0]), Variable(img_label)
                inputsV = inputsV.float() 
                labelsV = labelsV.float() 
                inputsV = inputsV.to(device) 
                labelsV = labelsV.to(device) 
                y_pred = model(inputsV)

                loss = criterion(y_pred, labelsV)
                running_loss = abs(loss.item())
                loss_array.append(running_loss)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
            avg_loss = np.average(loss_array)
            loss_values.append(avg_loss)
            acc = evaluate_model(validation_loader, model)
            acc = acc * 100
            acc_values.append(acc)
            print(f'Epoch [{epoch + 1}/{num_epochs}], \n Loss: {abs(avg_loss):.4f}, Accuracy: {acc:.4f}')
        acc_plot(acc_values) 
        loss_plot(loss_values)

    #Model testing 
    def evaluate_model(test_dl, model):
        predictions, actuals = list(), list()
        accuracy_array = []
        for ind, (data, img_label) in enumerate(test_dl):
            inputs = data.permute(0, 1, 4, 2, 3), 

            inputsV, labelsV = Variable(inputs[0]), Variable(img_label)
            inputsV = inputsV.float() 
            labelsV = labelsV.float() 
            inputsV = inputsV.to(device)
            labelsV = labelsV.to(device) 
            
            yhat = model(inputsV)

            actual = labelsV.cpu().numpy()
            
            actual = actual.reshape((len(actual), 1))
            
            yhat = yhat.cpu()
            yhat = yhat.detach().numpy()
            yhat = yhat.round()

            predictions.append(yhat)
            actuals.append(actual)
            predictions, actuals = vstack(predictions), vstack(actuals)
            acc = accuracy_score(actuals, predictions)
            accuracy_array.append(acc)
            
            #troubleshooting print commands for seeing model predictions and labels
            '''
            print(predictions)
            print(labelsV)
            '''
            
            print(acc)
            return acc

    #Accuracy plot over epochs
    def acc_plot(x):
        plt.figure(figsize=(16, 5))
        plt.xlabel('EPOCHS')
        plt.ylabel('ACCURACY')

        plt.plot(x, 'r', label='ACCURACY')
        plt.legend()
        plt.show()
    #Loss plot over epochs
    def loss_plot(x):
        plt.figure(figsize=(16, 5))
        plt.xlabel('EPOCHS')
        plt.ylabel('LOSS')

        plt.plot(x, 'b', label='LOSS')
        plt.legend()
        plt.show()
    
    #The command for training the model
    train_model(train_loader, model)
    
    #The command for testing the model
    evaluate_model(validation_loader, model)
    

The steps in the accuracy curve indicate that your sample size is small (thus a change in the classification of a single sample could already be visible), which is expected since you are trying to overfit the small dataset.
I would recommend to not increase the complexity of the model yet, but to make sure your model is able to overfit the tiny dataset first by playing around with some hyperparameters.

Thanks @ptrblck,

I followed your advice and over-fitted the model (first on 10 samples and then on the whole dataset). During this stage I found that the accuracy was not being computed correctly so I replaced the relevant code with torchmetrics Accuracy calculation. I also cleaned up the validation code and visualized it. Now the model works better and can easily be over fit (see the following images).


Should I address the small sample size now? I was thinking of using data augmentation to expand the sample size.

Do you mean the small subset for the initial overfitting testing or your general dataset?
I couldn’t find the dataset size in this topic so unsure if you are still training with only 10 samples.

Sorry for the confusion.

I meant that I should address the small sample size of my general dataset (N = 480).

All results and graphs in this thread have been using the general, 480 participant, dataset.