Measuring Accuracy/Loss when fine tuning fcn_resnet101 segmentation model

I want to fine tune the fcn_resnet101 segmentation model and I am following this beginner’s tutorial and this intermediate tutorial, that has some parts more pertinent to segmentation.

In the beginner’s tutorial, the problem its trying to solve is to classify images as either bees or ants, so the way accuracy is measured is by seeing how many of the predicted labels match the actual labels.

In segmentation I imagine accuracy is measured at the pixel level (i.e for each pixel in the image see if the predicted class matches the actual class in the annotated image). So my question is how can adapt the code below, that’s taken from from the classification example to measure the accuracy and loss for segmentation?

Are there any built in functions that can help track accuracy at the pixel level? Or if not, some sample code to do this so I can adapt it to the code below.

def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False):
    since = time.time()

    val_acc_history = []

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    if is_inception and phase == 'train':
                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                        outputs, aux_outputs = model(inputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4*loss2
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history

In the intermediate tutorial it seems that the accuracy is measured when calling the ‘train_one_epoch’ function in the lines below, but I don’t know if that function is just relevant for the mask_rcnn model, and can be used in the fcn_segmentation model as well.

    for epoch in range(num_epochs):
        # train for one epoch, printing every 10 iterations
        train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
        # update the learning rate
        lr_scheduler.step()
        # evaluate on the test dataset
        evaluate(model, data_loader_test, device=device)

The code should work for a segmentation model output of [N, nb_classes, H, W] and the corresponding target of [N, H, W] as this code snippet shows:

N, H, W = 2, 224, 224
nb_classes = 10
output = torch.randn(N, nb_classes, H, W)
preds = torch.argmax(output, 1)

labels = torch.randint(0, nb_classes, (N, H, W))

print(torch.sum(preds == labels).float() / labels.nelement())
> tensor(0.989)

Thanks!

N here is the batch-size right?

Yes, I used N to specify the batch size.

Thanks!

I have another question to make sure I understand this correctly.

In the #statistics section I currently have this which is giving me an accuracy value of 2.3 which doesn’t make sense.

            total_train += labels.nelement()

            correct_train += preds.eq(labels.data).sum().item()

            train_accuracy = 100 * correct_train / total_train

so then, should I replace the three lines above with the line below?:

train_accuracy = torch.sum(preds == labels).float() / labels.nelement()

The full code of the train_model function is below, with the old accuracy calculation commented in the statistics section. Would this be the correct way then, or am I still missing something?

def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False):

    since = time.time()

    val_acc_history = []

    best_model_wts = copy.deepcopy(model.state_dict())

    best_acc = 0.0

    for epoch in range(num_epochs):

        print('Epoch {}/{}'.format(epoch, num_epochs - 1))

        print('-' * 10)

        # Each epoch has a training and validation phase

        for phase in ['train', 'val']:

            if phase == 'train':

                model.train()  # Set model to training mode

            else:

                model.eval()   # Set model to evaluate mode


            running_loss = 0

            total_train = 0

            correct_train = 0

            # Iterate over data.

            for inputs, labels in dataloaders[phase]:

                inputs = inputs.to(device) #OriginalImage

                labels = labels.to(device) #Masks


                # zero the parameter gradients

                optimizer.zero_grad()

                # forward

                # track history if only in train

                with torch.set_grad_enabled(phase == 'train'):

                    # Get model outputs and calculate loss

                    # Special case for inception because in training it has an auxiliary output. In train

                    #   mode we calculate the loss by summing the final output and the auxiliary output

                    #   but in testing we only consider the final output.

                    if is_inception and phase == 'train':

                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958

                        outputs, aux_outputs = model(inputs)

                        loss1 = criterion(outputs, labels)

                        loss2 = criterion(aux_outputs, labels)

                        loss = loss1 + 0.4*loss2

                    else:

                        outputs = model(inputs)['out']

                        labels = labels.long()


                        outputs = outputs.squeeze(1)

                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase

                    if phase == 'train':

                        loss.backward()

                        optimizer.step()

                # statistics

                running_loss += loss.item() #* inputs.size(0)

                ##total_train += labels.nelement()

                ##correct_train += preds.eq(labels.data).sum().item()

                ##train_accuracy = 100 * correct_train / total_train
                train_accuracy =  torch.sum(preds == labels).float() / labels.nelement()


            epoch_loss = running_loss / len(dataloaders[phase].dataset)

            ##epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

            epoch_acc = train_accuracy/len(dataloaders[phase].dataset)

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model

            if phase == 'val' and epoch_acc > best_acc:

                best_acc = epoch_acc

                best_model_wts = copy.deepcopy(model.state_dict())

            if phase == 'val':

                val_acc_history.append(epoch_acc)

        print()

    time_elapsed = time.time() - since

    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))

    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights

    model.load_state_dict(best_model_wts)

    return model, val_acc_history

The result is a bit weird. Could you run some sanity checks and see, what numbers total_train and correct_train have after a couple of iterations?
The second approach would only calculate the accuracy of the last batch, if I’m not mistaken.

After doing some debugging I think I know what I did wrong. Like you said I was only getting the accuracy of the past batch, lets say 97 and it was diving that number of the items in my validation data set (in my case around 40) so that’s why I was getting 2.33 as a number for accuracy. So now what I’m going to do is sum all the accuracies and divide that sum by the length of my dataset, and that should give me the correct accuracy for the epoch.

But I do have one question, how can I get the number 0.8875 within the tensor tensor(0.8875, device='cuda:0')

You could call tensor.item() to get the Python scalar value.

I have a question.
is my code correct for accuracy of one label?

N, H, W = 2, 224, 224
nb_classes = 10
output = torch.randn(N, nb_classes, H, W)
preds = torch.argmax(output, 1)
labels = torch.randint(0, nb_classes, (N, H, W))
labels1 = labels.eq(5).float()
preds1 = preds.eq(5).float()
print(torch.sum(preds1 == labels1).float() / labels1.nelement())
>  tensor(0.8196)

Your code snippet will calculate the accuracy for an imbalanced binary classification use case for class5 only.
Note that by selecting class5, you will get a lot of “non-class5” outputs, which will create a high True Negative score.
E.g. if your current segmentation output contains only a single pixel with class5, which is also wrongly classified, you’ll still get a very high accuracy as shown in this example:

N, H, W = 2, 20, 20
nb_classes = 6
output = torch.randn(N, nb_classes, H, W)
# Predict only a single pixel as class5
output[:, 5] = -1000000
output[0, 5, 0, 0] = 100000

preds = torch.argmax(output, 1)

# Create only single target as class5
labels = torch.randint(0, nb_classes-1, (N, H, W))
labels[0, 1, 1] = 5

labels1 = labels.eq(5).float()
preds1 = preds.eq(5).float()
print(torch.sum(preds1 == labels1).float() / labels1.nelement())
> tensor(0.9975)
1 Like

@ptrblck Thanks a lot.