Validation accuracy won't change while validation loss decreases

I am focused on a semantic segmentation task. I have 30 images. 24 are in training set, 4 in validation set and 2 as test images. I have a batch_size=4.
The issue that I am facing is that I get strange values for validation accuracy. This value increases from the first to the second epoch and then stays the same however, validation loss and training loss decreases and also training accuracy increases.

This is the piece of code that calculates these values:

def train_model(model, dataloaders, criterion, optimizer, num_epochs, has_aux = True):
since = time.time()

val_acc_history = [] 
#https://docs.python.org/3/library/copy.html
best_model_wts = copy.deepcopy(model.state_dict())# this variable contains a dictionary object of model's learnable parameters
best_acc = 0.0

for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch, num_epochs - 1))
    print('-' * 10)

    # Each epoch has a training and validation phase
    for phase in ['train', 'val']:
        if phase == 'train':
            model.train()  # Set model to training mode
            img_all = len(imgarray)
        else:
            model.eval()   # Set model to evaluate mode
            img_all = len(imgarray_val)

        running_loss = 0.0 # Training loss
        running_corrects = 0 # Number of pixels that have been correctly classified

        # Iterate over data
        for inputs, labels in dataloaders[phase]: # Read data batch by batch. We have 24 images and batches of size 2 so 6 iterations to go through one epoch
            inputs, labels = Variable(inputs), Variable(labels)
            inputs = inputs.to(device) # how should I send the data to GPU?
            labels = labels.to(device)
            
            #print('image shape: {}'.format(inputs.shape))
            #print('label shape: {}'.format(labels.shape))
            #labels = labels.squeeze(1)
            #print('squeezed label shape:{}'.format(labels.shape))

            # zero the parameter gradients
            optimizer.zero_grad()

            # Forward
            # track history if only in train
            with torch.set_grad_enabled(phase == 'train'): #do we go inside 'with' exception handling even if the condition is not satisfied?
                # Get model outputs and calculate loss
                # Special case for "inception" and "FCN_ResNet_101" because in training it has an auxiliary output. 
                # In train mode we calculate the loss by summing the final output and the auxiliary output
                # but in testing we only consider the final output.
                
                if has_aux and phase == 'train':
                    # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                    outputs = model(inputs) #  model output shape:torch.Size([4, 4, 224, 224]) -> dim=1 is the number of classes
                    #print('model output shape:{}'.format(outputs['out'].shape))
                    loss1 = criterion(outputs['out'], labels.long()) #what is the type and shape of this loss? -crossentropyloss for semantic segmentation and in pytorch
                    loss2 = criterion(outputs['aux'], labels.long()) #we can apply loss.item()-what does it return?
                    loss = loss1 + 0.4*loss2
                
                else:
                    outputs = model(inputs)
                    loss = criterion(outputs['out'], labels.long())
                
                
                
                 
                #  model output shape: torch.Size([4, 4, 224, 224]) -> dim=1 is the number of classes
                # -- torch.max(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor)
                # :Returns a namedtuple (values, indices) where values is the maximum value of each row of 
                # the input tensor in the given dimension dim. And indices (preds in our case) is the index location of each 
                # maximum value found (argmax).
                # If keepdim is True, the output tensors are of the same size as input except in the 
                # dimension dim where they are of size 1. 
                _, preds = torch.max(outputs['out'], 1)#preds is a long tensor

                # backward + optimize only if in training phase
                if phase == 'train':
                    loss.backward()
                    optimizer.step()

            # statistics
            running_loss += loss.item() * inputs.size(0) #for the minibatch: running_loss==training_loss and this is the usual formula to calculate running loss
            running_corrects += torch.sum(preds == labels.data.long())# for the mini batch: number of pixels that has been correctly classified
            
        #https://discuss.pytorch.org/t/how-to-calculate-running-loss-training-loss-while-training-a-cnn-model/49301/2    
        #https://discuss.pytorch.org/t/calculate-train-accuracy-of-the-model-in-segmentation-task/33581
        #https://stackoverflow.com/questions/51503851/calculate-the-accuracy-every-epoch-in-pytorch
     
        epoch_loss = running_loss / len(dataloaders[phase].dataset)#Then at the end of the epoch we divide the number with the number of steps.
        #epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset) #what does this calculate?
        epoch_acc = 100*(running_corrects.double() / (img_all*224*224))
        #epoch_acc = (running_corrects.double() /(labels.nelement()*(size of training dataset)))*100
        print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

        # deep copy the model
        if phase == 'val' and epoch_acc > best_acc:
            best_acc = epoch_acc
            # https://pytorch.org/tutorials/beginner/saving_loading_models.html
            best_model_wts = copy.deepcopy(model.state_dict())
        if phase == 'val':
            val_acc_history.append(epoch_acc)

    print()

time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))

# load best model weights
model.load_state_dict(best_model_wts)
return model, val_acc_history

I have not being able to detect an issue in this code but I can’t find a reason for the strange behavior of my model.
Any tips will be appreciated :slight_smile:

Hi,

My guess is that the model has overfitted and outputs the same predictions after epoch 1.

If this is the case, the question is why and how to get around it. Perhaps your dataset contains 90% of one of the classes which will bias the model to predict only that class in the evaluation.

You can verify this by printing/saving some out the model outputs after each epoch. If I were you I’d try to save the predictions as images, each class gets its own color. That should help with understanding what the model does.